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THERMOSTABLE PHOSPHATASES 



This 'invention relates to newly identified 
polynucleotides, polypeptides encoded by such 
polynucleotides, the use of such polynucleotides and 
polypeptides, as well as the production and isolation of such 
polynucleotides and polypeptides. More particularly, the 
polynucleotides and polypeptides of the present invention 
have been identified as thermostable alkaline phosphatases. 

HArKGROUWD TfP^ TKTVENTION 
Phosphatases are a group of enzymes that remove 
phosphate groups from organophosphate ester compounds . There 
are numerous phosphatases, including alkaline phosphatases, 
phosphodiesterases and phytases. 

Alkaline phosphatases are widely distributed enzymes and 
are composed of a group of enzymes which hydrolyze organic , 
phosphate ester bonds at alkaline pH. 

Phosphodiesterases are capable of hydrolyzing nucleic 
. acids by hydrolyzing the phosphodiester bridges of DNA and 
RNA The classification of phosphodiesterases depends upon 
which side of the phosphodiester bridge is attacked. The 3' 
enzymes specifically hydrolyze the ester linkage between the 
3' carbon and the phosphoric group whereas the 5' enzymes 
hydrolyze the ester linkage between the phosphoric group and 
the 5' carbon of the phosphodiester bridge. The best known 
of the class 3' enzymes is a phosphodiesterase from the venom 
of the rattlesnake or from a rustle's viper, which hydrolyses 
all the 3- bonds in either RNA or DNA liberating nearly all 
the nucleotide units as nucleotide 5- phosphates. This 
enzyme requires a free 3- hydroxyl group on the terminal 
nucleotide residue and proceeds stepwise from that end of the 
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polynucleoc ide chain. This enzyme and all other nucleases 
which attack only at the ends of the polynucleotide chains 
are called exonucleases . The 5' enzymes are represented by 
a phosphodiesterase from bovine spleen, also an exonuclease, 
which hydrolyses all the 5' linkages of both DNA and RNA and 
thus liberates only nucleoside 3' phosphates. It begins its 
attack at the end of the chain having a free 3 ' hydroxyl 
group . 

Phytases are enzymes which recently have been introduced 
to commerce. The phytase enzyme removes phosphate from 
phytic acid (inositol hexaphosphoric acid) , a compound found 
in plants such as corn, wheat and rice. The enzyme has 
commercial use for the treatment of animal feed, making the 
inositol of the phytic acid available for animal nutrition. 
AspergriiJus ficuuni and wheat are sources of phytase. 
( Business Communications Co.. Inc. . 25 Van Zant Street, 
Norwalk, CT 06855) . 

Phytase is used to improve the utilization of natural 
phosphorus in animal feed. Use of phytase as a feed additive 
enables the animal to metabolize a larger degree of its 
cereal feed's natural mineral content thereby reducing or 
altogether eliminating the need for synthetic phosphorus 
additives. More important than the reduced need for 
phosphorus additives is the corresponding reduction of 
phosphorus in pig and chicken waste. Many European countries 
severely limit the amount of manure that can be spread per 
acre due to concerns regarding phosphorus contamination of 
ground water. This is highly important in northern Europe, 
and will eventually be regulated throughout the remainder of 
the European Continent and the United States as well. 
(Excerpts from Business Trend Analysts, Inc. , January 1994, 
Frost and Sullivan Report 1995 and USDA on-line information.) 
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AlkaliP.e phosphatase hydrolyzes nonophosphace escers, 
releasing an organic phosphate and . the cognate alcohol 
compound. It is non-specific with respect to the alcohol 
moiety and it is this feature which accounts for the many 
uses of this enzyme. The enzyme has a pH optimum between 9 
and 10, however, it can also function at neutral pH, (study 
of the enzyme industry conducted by Business Communications 
company. Inc., 25 Van Zant Street, Norwalk, Connecticut 
06855 , 1995 . ) . 

Thermostable alkaline phosphatases are not irreversibly 
inactivated even when heated to 60«C or more for brief 
periods of time, as, for example, in the practice of 
hydrolyzing monophosphate esters. 

Alkaline phosphatases may be obtained from numerous 
thermophilic organisms, such as A^nonifex degensii, Agu.fex 
nyrophilus, ArchaecgZobus lithotrophicus, Meti:anococcu. 
igneus. Pyroiobus(a Cranarchaeota) , Pyrococcus and 
Ther:rococcus. which are mostly Eubacteria and Euryarchaeota . 
Many of these organisms grow at temperatures up to about 
103-C and are unable to grow below 70=C. These anaerobes are 
isolated from extreme environments. For example, 

Thermococcus CL-2 was isolated from a worm residing on a 
"black smoker" sulfite structure. 

interest in alkaline phosphatases from thermophilic 
microbes has increased recently due to their value for 
commercial applications. Two sources of alkaline 

phosphatases dominate and compete commercially: (i) animal, 
from bovine and calf intestinal mucosa, and (ii) bacterial 
from B coli. Due to the high turnover number of calf 
intestinal phosphatase, it is often selected as the label in 
many enzyme in^unoassays . The usefulness of calf alkaline 
phosphatase, however, is limited by its inherently low 
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thermostability, which is even further compromised durir.g the 
chemical preparation of the enzyme: antibody conjugates. 
Bacterial alkaline phosphatase is an alternative to calf 
alkaline phosphatase due to bacterial alkaline phosphatase's 
extreme thermotolerance at temperatures as high as 95*C 
{Tomazic-Allen, S.J., Recombinant Bacterial Phosphatase as an 
Immunodiagnostic Enzyme, Annals D Biology Clinique. 
49{5):287-90 (1991), however, the enzyme has a very low 
turnover number. 

There is a need for novel phosphatase enzymes having 
enhanced thermostability. This includes a need for 
thermostable alkaline phosphatases whose enhanced 
thermostability is beneficial in enzyme labeling processes 
and certain recombinant DNA techniques, such as in the 
dephosphorylation of vector DNA prior to insert DNA ligation. 
Recombinant phosphatase enzymes provide the proteins in a 
format amenable to efficient production of pure enzyme, which 
can be utilized in a variety of applications as described 
herein. Accordingly, there is a need for the 

characterization, amino acid sequencing, DNA sequencing, and 
heterologous expression of thermostable phosphatase enzymes. 
The present invention meets these need by providing DNA and 
amino acid sequence information and exprssion and 
purification protocol for thermostable phosphatase derived 
from several organisms. 

SUMMARY OF THE INVENTION 
The present invention provides thermostable phosphatases 
from several organisms. In accordance with one aspect of 
the present invention, there are provided novel enzymes, as 
well as active fragments, analogs and derivatives thereof. 

In accordance with another aspect of the present 
invention, there are provided isolated nucleic acid molecules 
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encoding the enzymes of the present invention, including " 
mRNAs, cDNAs, genomic DNAs , as well as active analogs and 
fragments of such nucleic acids., 

In accordance with another aspect of the present 
invention, there are provided isolated nucleic acid molecules 
encoding mature enzymes expressed by the DNA contained in the 
plasmid DNA vector deposited with the ATCC as Deposit No. 
97536 on May 10, 1996. 

in accordance with a further aspect of the present 
invention, there is provided a process for producing such 
polypeptides by recombinant techniques comprising culturing 
recombinant prokaryotic and/or eukaryotic host cells, 
containing a nucleic acid sequence of the present invention, 
under conditions promoting expression of said enzymes and 
subsequent recovery of said enzymes. 

in accordance with yet a further aspect of the present 
invention, there is provided a process for utilizing such 
enzymes for hydrolyzing monophosphate ester bonds, as an 
enzyme label in imnnmoassays , for removing 5' phosphate prior 
to end-labeling, and for dephosphorylating vectors prior to 
insert ligation. 

in accordance with yet a further aspect of the present ' 
invention, there are also provided nucleic acid probes 
comprising nucleic acid molecules of sufficient length to 
specifically hybridize to a nucleic acid sequence of the 

present invention. 

in accordance with yet a further aspect of the present 
invention, there is provided a process for utilizing such 
enzymes, or polynucleotides encoding such enzymes, for in 
vitro purposes related to scientific research, for example. 
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CO generate probes for, identifying similar sequences which 
might encode similar enzymes from other organisms by using 
certain regions, i.e., conserved sequence regions of the 
nucleotide sequence. 

These and other aspects of the present invention will be 
apparent to those of skill in the arc from Che teachings 
herein. 

BRIEF DESCRIPTION OF THE DRAWINGS 
The following drawings are illustrative of embodiments 
of the invention and are not meant to limit the scope of the 
invention as encompassed by the claims. 

Figure 1 is an illustration of the full -length' DNA and 
corresponding deduced amino acid sequence of Anmonifex 
degensii KC4 of the present invention. Sequencing was 
performed using a 378 automated DNA sequence for all 
sequences of the present, invention (Applied Biosystems, Inc., 
Foster City, California) . 

Figure 2 is an illustration of the full-length DNA and 
corresponding deduced amino acid sequence of Methanococcus 
igneus Kol5. 

Figure 3 is an illustration of the full-length DNA and 
corresponding deduced amino acid sequence of Thermococcus 
alcaliphilus AEDII12RA. 

Figure 4 is an illustration of the full-length DNA and 
corresponding deduced amino acid sequence of Thezwococcus 
celer. 
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Figure 5 is an iUusCration of the full-lengch DNA and ' 

corresponding deduced . amino acid sequence of The«nococcus 
GU5L5 . 

Figure. 6 is an illustration of the full-length DNA and 

corresponding deduced amino acid sequence of 0C9a. 

Figure 7 is an illustration of the full-length DNA and 
corresponding deduced amino acid sequence of MllTL. 

Figure 8 is an illustration of the full-length DNA and 
corresponding deduced amino acid sequence of Thermoccccus 
CL-2. 

Figure 9 is an illustration of the full-length DNA and 
corresponding deduced amino acid sequence of Aguifex VF-5. 



r.n.T»TT.Bn nES< -PTDTTnM OF THK INVENTION 

To facilitate understanding of the invention, a number 
of terms are defined below. 

The term "isolated" means altered "by the hand of man" 
from its natural state; i.e., if it occurs in nature, it has 
been changed or removed from its original environment, or 
both For example, a naturally occurring polynucleotide or 
a polypeptide naturally present in a living animal m its 
natural state is not "isolated", but the same polynucleotide 
or polypeptide separated from the coexisting materials of its 
natural state is "isolated", as the term is employed herein. 
For example, with respect to polynucleotides, the term 
isolated means that it is separated from the nucleic acid and 
cell in which it naturally occurs. 



-7- 



W0 97/48416 PCT/US97/10784 

AS pare of or following isolation, such polynucleoc ides 
can be joined to other polynucleotides, such as DNAs , for 
mutagenesis, to form fusion proteins, and for propagation or 
expression in a host, for instance. The isolated 
polynucleotides, alone or joined to other polynucleotides 
such as vectors, can be introduced into host cells, in 
culture or in whole organisms. Introduced into host cells in 
culture or in whole organisms, such polynucleotides still 
would be isolated, as the term is used herein, because they 
would not be in their naturally occurring form or 
environment. Similarly, the polynucleotides and polypeptides 
may occur in a composition, such as a media formulation 
{solutions for introduction of polynucleotides or 
polypeptides, for example, into cells or compositions or 
solutions for chemical or enzymatic reactions which are not 
naturally occurring compositions) and, therein' remain 
isolated polynucleotides or polypeptides within the meaning 
of that term as it is employed herein. 

The term "ligation" refers to the process of forming 
phosphodiester bonds between two or more polynucleotides, 
which most often are double stranded DNAs. Techniques for 
ligation are well Icnown to the art and protocols for ligation 
are described in standard laboratory manuals and references, 
such as, for instance, San^rook et al., MOLECULAR CLONING, A 
LABORATORY MANUAL, 2nd Ed.; Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, New York (1989) . 

The term "oligonucleotide" as used herein is defined as 
a molecule comprised of two or more deoxyribonucleotides or 
ribonucleotides, preferably more than three, and usually more 
than ten. The exact size of an oligonucleotide will depend 
on many factors, including the ultimate function or use of 
the oligonucleotide. Oligonucleotides can be prepared by any 
suitable method, including, for example, cloning and 
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restriction of appropriate sequences and direct chemical 
synthesis by a method such as the phosphotriester method of 
Narang et al . , 1979, Meth. Enzymol - , 68:90-99; the 
phosphodiester method of Brown ec al . , 1979, Mechod Enzymol . , 
68:109-151, the diethylphosphoramidite method of Beaucage et 
al,, 1981, Tetrahedron Lett., 22:1859-1862; the criester 
method of Matteucci et al., 1981, J. Am. Chem. Soc, 
103:3185-3191, or automated synthesis methods; and the solid 
support method of U.S. Patent No. 4,458,066. 

The term "plasmids" generally is designated herein by a 
lower case p preceded and/or followed by capital letters 
and/or numbers, in accordance with standard naming 
conventions that are familiar to those of skill in the art. 

Plasmids disclosed herei n are either commercially 
available, publicly available on an unrestricted basis, or 
can be constructed from available plasmids by routine 
application of well known, published procedures. Many 
plasmids and other cloning and expression vectors that can be 
used in accordance with the present invention are well known 
and readily available to those of skill in the art. 
Moreover, those of skill readily may construct any number of 
other plasmids suitable for use in the invention. The 
properties, construction and use of such plasmids, as well as 
other vectors, in the present invention will be readily 
apparent to those of skill from the present disclosure. 

The term "polynucleotide (s) " generally refers to any 
polyribonucleotide or polydeoxyribonucleotide, which may be 
unmodified RNA or DNA or modified RNA or DNA. Thus, for 
instance, polynucleotides as used herein refers to, among 
others, single-and double -stranded DNA, DNA that is a mixture 
of single-and double -stranded regions, single- and double- 
stranded RNA, and RNA that is mixture of single- and double - 
stranded regions, hybrid molecules comprising DNA and RNA 
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Chat may be single-stranded or, more typically, double- 
stranded or a mixture of. single- and double -stranded regions. 

In addition, polynucleotide as used herein refers to 
triple-stranded regions comprising RNA or DMA or both RNA and 
DNA. The strands in such regions may be from the same 
molecule or from different molecules. The regions may 
include all of one or more of the molecules, but more 
typically involve only a region of some of the molecules. 
One of the molecules of a triple-helical region often is an 
oligonucleotide . 

As used herein, the term polynucleotide includes DNAs or 
RNAs as described above that contain one or more modified 
bases. Thus, DNAs or RNAs with backbones modified for 
stability 6r for other reasons are "polynucleotides" as that 
term is intended herein. Moreover, DNAs or RNAs comprising 
unusual bases, such as inosine, or modified bases, such as 
tritylated bases, to name just two examples, are 
polynucleotides as the term is used herein. 

It will be appreciated that a great variety of 
modifications have been made to DNA and RNA that serve many 
useful purposes known to those of skill in the art. The term 
polynucleotide as it is employed herein embraces such 
chemically, enzymatically or metabolically modified forms of 
polynucleotides, as well as the chemical forms of DNA and RNA 
characteristic of viruses and cells, including simple and 
complex cells, inter alia. 

The term "primer" as used herein refers to an 
oligonucleotide, whether natural or synthetic, which is 
capable of acting as a point of initiation of synthesis when 
placed under conditions in which primer extension is 
initiated or possible. Synthesis of a primer extension 
product which is complementary to a nucleic acid strand is 
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iniciated in the presence of nucleoside criphosphaces and a 
polymerase in an appropriate buffer at a suitable 
temperature . 

The term "primer" may refer to more than one primer, 
particularly in the case where there is some ambiguity in the 
information regarding one or both ends of the target region 
to be synthesized. For instance, if a nucleic acid sequence 
is inferred from a protein sequence, a "primer" generated to 
synthesize nucleic acid encoding said protein sequence is 
actually a collection of primer oligonucleotides containing 
sequences representing all possible codon variations based on 
the degeneracy of the genetic code. One or more of the 
primers in this collection will be homologous with the end of 
the target sequence. Likewise, if a "conserved" region shows 
significant levels of polymorphism in a population, mixtures 
of primers can be prepared that will amplify adjacent 
sequences . 

The term "restriction endonucleases" and "restriction 
enzymes" refers to bacterial enzymes which cut double - 
stranded DNA at or near a specific nucleotide sequence. 

The term "gene" means the segment of DNA involved in 
producing a polypeptide chain; 

it includes regions preceding and following the coding region 
(leader and trailer) as well as intervening sequences 
(introns) between individual coding segments (exons) . 

A coding sequence is "operably linked" to another coding 
sequence when RNA polymerase will transcribe the two coding 
sequences into a single mRNA, which is then translated into 
a single polypeptide having amino acids derived from both 
coding sequences. The coding sequences need not be 
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contiguous to one another so long as the expressed sequences 
ultimately process to produce the desired protein. 

"Recombinant" enzymes refer to enzymes produced by 
recombinant DNA techniques; i.e., produced from cells 
transformed by an exogenous DNA construct encoding the 
desired enzyme. "Synthetic" enzymes are those prepared by 
chemical synthesis. 

A DNA "coding sequence of" or a "nucleotide sequence 
encoding" a particular enzyme, is a DNA sequence which is 
transcribed and translated into an enzyme when placed under 
the control of appropriate regulatory sequences. 

The term "thermostable phosphatase" refers to an enzyme 
which is stable to heat and heat-resistant and catalyzes the 
removal of phosphate groups from organophosphate ester 
compounds. Reference to "thermostable phosphatases" includes 
alkaline phosphatases, phosphodiesterases and phytases. 

The phosphatase enzymes of the present invention cannot 
become irreversibly denatured (inactivated) when subjected to 
the elevated temperatures for the time necessary to effect 
the hydrolysis of a phosphate group from an organophosphate 
ester compound. Irreversible denaturation for purposes 
herein refers to permanent and complete loss of enzymatic 
activity. The phosphatase enzymes do not become irreversibly 
denatured from exposure to temperatures of a range from about 
60°C to about IIB-'C or more. The extreme thermostability of 
the phosphatase enzymes provides additional advantages over 
previously characterized thermostable enzymes. Prior to the 
present invention, efficient hydrolysis of phosphate groups 
at temperatures as high as 100 °C has not been demonstrated. 
No thermostable phosphatase has been described for this 
purpose . 
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In accordance with an aspect of the present invention, 
there are provided isolated nucleic acids (polynucleotides) 
which encode for the mature enzymes having the deduced amino 
acid sequences of Figures 1-9 (SEQ ID NOS;28-36). 

In accordance with another aspect of the present 
invention, there are provided isolated polynucleotides 
encoding the enzymes of the present invention. The deposited 
material is a mixture of genomic clones comprising DNA 
encoding an enzyme of the present invention. Each genomic 
clone comprising the respective DNA has been inserted into a 
pBluescript vector (Stratagene, La Jolla, CA) . The deposit 
has been deposited with the American Type Culture Collection, 
12301 Parklawn Drive, Rockville, Maryland 20852, USA, on May 
10, 1996 and assigned ATCC Deposit No. 97536. 

The deposit (s) have been made under the terms of the 
Budapest Treaty on the International Recognition of the 
deposit of micro-organisms for purposes of patent procedure. 
The strains will be irrevocably and without restriction or 
condition released to the public upon the issuance of a 
patent . These deposits are provided merely as convenience to 
those of skill in the art and are not an admission that a 
deposit be required under 35 U.S.C. §112. The sequences of 
the polynucleotides contained in the deposited materials, as 
well as the amino acid sequences of the polypeptides encoded 
thereby, are controlling in the event of any conflict with 
any description of sequences herein. A license may be 
required to make, use or sell the deposited materials, and no 
such license is hereby granted. 

The polynucleotides of this invention were originally 
recovered from genomic gene libraries derived from the 
following organisms: 
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Aimnonifex degensii KC4 is a eubacteria from the genus 
Ammonifex. It was isolated in Java, Indonesia. It is a 
gram- negative, chemoli thoautotroph . It grows optimally at 
70°C in a low-salt culture medium at pH 7 with 0.2% nitrate 
as a substrate and Hj/COj in gas phase. . 

Methanococcus igneus K0L5 is a Euryarchaeoca isolated 
from Kolbeinsey Ridge in the north of Iceland. It grows 
optimally at 85*C and pH 7 . 0 in a high-salt marine medium 
with H2/CO3 in a gas phase. Aqvifex pyrophilus KOL 5A is a 
marine bacteria isolated from th Kolbeinsey Ridge in the 
north of Iceland. It is a gram-negative , rod-shaped, 
strictly chemoli thoautotrophic , knall gas bacterium, and a 
denitrifier. It grows optimally at 85»C in high-salt marine 
medium at pH 6.8 with 0^ as a substrate and H,/CO, 0 . 5% O, in 
gas phase . 

Thermococcus alcaliphilus AEDII12RA is from the genus 
Thermococcus. AEDII12RA grows optimally at BS^C, pH 9.5 in 
a high salt medium (marine) containing pclysulfides and yeast 
extract as substrates and in gas phase. 

Thermococcus celer is an Suryarchaeota . It grows 
optimally at 85'C and pH 6 . 0 in a high-salt marine medium 
containing elemental sulfur, yeast extract, and peptone as 
substrates and in gas phase. 

Thermococcus GU5L5 is an Euryarchaeota isolated from the 
Guayraas Basin in Mexico. It grows optimally at 85»C and pH 
6.0 in a high-salt marine medium containing 1% elemental 
sulfur. 0.4% yeast extract, and 0.5% peptone as substrates 
with Nj in gas phase. 

OC5a-27A3A is a bacteria of unknown etilogy obtained 
from Yellowstone National Park and maintained as a pure 
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culture. It grows well on a TK6 medium and has cellulose 
degrader activity. Further, it codes for an alkaline 
phosphatase having greater than 50% polypeptide identity and 
greater than 32% polynucleotide identity to each of Bomhyx 
mori and Escherichia coli C alkaline phosphatase precursors, 
which is significant homoloygy. Thus, it is expectged that 
OC9a-27A3A can be cloned and expressed readily m Escherichi 
Coli C in place of its native alkaline phosphatase precursor. 

Mil TL is a new species of Desulfurococcus isolated from 
Diamond Pool in Yellowstone National Park. MllTL grows 
heterotrophically by fermentation of different organic 
materials (sulfur is not necessary) and forms grape-like 
aggregates. The organism grows optimally at QS°C to 88 °C and 
pH 7.0 in a low salt medium containing yeast extract, 
peptone, and gelatin as substrates with an N^/CO-^ gas phase. 

Thermococcus CL-2 is an Euryarchaeoca isolated from the 
North Cleft Segment in the Juan de Fuca Ridge. It grows 
optimally at 88°C in a salt medium with an argon atmosphere. 

Aguifex VF-5 is a marine bacteria isolated from a beach 
in Vulcano, Italy. It is a gram-negative, rod-shaped, 
strictly chemolithoautotrophic , knall gas bacterium. It 
grows optimally from 85-90°C in high-salt marine medium at pH 
6.8, with Oj as a substrate and Hj/COj + o.5% Oj in gas phase. 

Accordingly, the polynucleotides and enzymes encoded 
thereby are identified by the organism from which they were 
isolated, and are sometimes hereinafter referred to as "KC4" 
(Figure 1 and SEQ ID N0S:19 and 28), "Kol5" {Figure 2 and SEQ 
ID NOS:20 and 29), "AEDII12RA" (Figure 3 and SEQ ID N0S:21 
and 30) , "Celer" (Figure 4 and SEQ ID NOS:22 and 31) , "GU5L5" 
(Figure 5 and SEQ ID NOS:23 and 32), "0C9a" (Figure 6 and SEQ 
ID NOS:24 and 33), "MllTL" (Figure 7 and SEQ ID NOS:25 and 
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34), ■•CL-2" (Figure 8 and SEQ ID NOS:26 and 35) and "VF-5" 
(Figure 9 and SEQ ID NOS:27 and 36) . 

The polynucleotides and polypeptides of the present 
invention show identity of the nucleotide and protein level 
CO known genes and proteins encoded thereby as shown in 
Table 1. 
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Table 1 



Clone . 


Gene/Protein wiih 
Closest Homology 


Protein 
Identity 


Nucleic 

Acid 
Identity 


Ammonifex degensiii 
KC4-3A1A 


Yarrowia lipolyiica, Candida lipolytica. 
icid phosphatase 


47% 


24% 


Ammonifex degensii 
KC4-3A1A 


Saccharomyces cerevisiae. hypothetical 
protein YBR094w 


54% 


26% 1 


Methanococcus igeneus 
K015-9AIA 


Yarrowia lipolytica, Candida lipolyiica, 
acid phosphatase 


45% 


25% 


Methanococcus igeneus 
Kol5-9AlA 


Saccharomyces cerevisiae, hypothetical 
protein YBR094w, hypothetical protein 
YBR0821 


52% 


25% 


Thermococcus alcaiiphilus 
AEDiri2RA-18A 


No homology found 


-- 




Viermococus celer 25A1A 


No homology found 






Thermococcus GU5L5- 
26A1A 


OUCtfiilw JuC/llifat uJ^dLillIC yiivJjLPlloUuC 

IV precursor, alkaline 
phosphomonoesterase, 
glycerophosphatase, and 
phosphomonoesterase 


5S^ 


38% 


Thermococcus GU5L5- 
26AIA 


Bacillius subtitis, alkaline phosphatase 
III precursor 




Jf /o 


OC9a-27A3A 


Bombyx mori (silkworm), alkaline 
phosphatase precursor 


54% 


33 % 


0C9a - 27A3A 


Escherichia coli C, alkaline 
phosphatase precursor 


53% 


34% 


Mil TL - 29A1A 


Rhodobacter capsulatus. hypothetical 


43% 


24% 


Thermococcus C12-30A1A 


Yarrowia lipolytica, Candida lipolytica, 
acid phosphatase 


49% 


27% 


Thermococcus CL2-30A1A 


Saccharomyces cerevisiae, hypothetical 
protein YBR094w hypothetical protein 
YBR0821 


50% 


25% 


Aquifex VF5-34A1A 


Escherichia coli, suppressor protein 
suhB 


57% 


34% 
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All Of the clones identified in Table 1 encode 
polypeptides which have phosphatase activity.' 

One means for isolating the nucleic acid molecules 
encoding the enzymes of the present invention is to probe a 
gene library with a natural or artificially designed probe 
using art recognized procedures (see, for example: Current 
Protocols in Molecular Biology, Ausubel F.M. ec al. (EDS.) 
Green Publishing Company Assoc. and John Wiley Interscience . 
New York, 1989, 1992). It is appreciated by one s)cilled in 
the art that the polynucleotides of SEQ ID NOS : 1-18, or 
fragments thereof (comprising at least 12 contiguous 
nucleotides), are particularly useful probes. Other 
particularly useful probes for this purpose are hybridizable 
fragments of the sequences of SEQ ID NOS: 19-27 (i.e., 
comprising at least 12 contiguous nucleotides) . 

With respect to nucleic acid sequences which hybridize 
to specific nucleic acid sequences disclosed herein, 
hybridization may be carried out under conditions of reduced 
stringency, medium stringency or even stringent conditions. 
AS an example of oligonucleotide hybridization, a polymer 
membrane containing immobilized denatured nucleic acids is 
first prehybridized for 30 minutes at 45»C in a solution 
consisting of 0.9 M NaCl, .50 mM NaH,PO„ pH 7.0, 5.0 mM 
Na,EDTA, 0.5% SDS, lOX Denhardt's, and 0.5 mg/mL 
polyriboadenylic acid. Approximately 2 X 10' cpm (specific 
activity 4-9 X 10' cpm/ug) of "P end-labeled oligonucleotide 
probe are then added to the solution. After 12-16 hours of 
incubation, the membrane is washed for 30 minutes at room 
temperature in IX SET (150 nM NaCl, 20 tM Tris hydrochloride, 
pH 7.8, 1 mM Na,EDTA) containing 0.5% SDS, followed by a 30 
minute wash in fresh IX SET at (Tm less lO'C) for the oligo- 
nucleotide probe. The membrane is then exposed to auto- 
radiographic film for detection of hybridization signals. 

-IB- 
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Stringent conditions means hybridization will occur only 
if there is at least 90% identity, preferably at least 95% 
identity and most preferably at least 97% identity between 
the sequences. Further, it is understood that a section of 
a 100 bps sequence that is 95 bps in length has 95% identity 
with the 1090 bps sequence from which it is obtained. See J. 
Sambrook et al . , Molecular Cloning, A Laboratory Manual, 2d 
Ed., Cold Spring Harbor Laboratory (1989) which is hereby 
incorporated by reference in its entirety. Also, it is 
understood that a fragment of a 100 bps sequence that is '95 
bps in length has 95% identity with .the 100 bps sequence from 
which it is obtained. 

As used herein, a first DNA (RNA) sequence is at least 
70% and preferably at least 80% identical to another DNA 
(RNA) sequence if there is at least 70% and preferably at 
least a 80% or 90% identity, respectively, between the bases 
of the first sequence and the bases of the another sequence, 
when properly aligned with each other, for example when 
aligned by BLASTN. 

The present invention relates to polynucleotides which 
differ from the reference polynucleotide such that the 
differences are silent, for example, the amino acid sequence 
encoded by the polynucleotides is the same. The present 
invention also relates to nucleotide changes which result in 
amino acid substitutions, additions, deletions, fusions and 
truncations in the polypeptide encoded by the reference 
polynucleotide. In a preferred aspect of the invention these 
polypeptides retain the same biological action as the 
polypeptide encoded by the reference polynucleotide. 

The polynucleotides of this invention were recovered 
from genomic gene libraries from the organisms listed in 
Table 1. Gene libraries were generated from either of a 
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Lambda ZAP "ll or a pBluscriptJ cloning vector (Stratagene " 
Cloning Systems). Mass excisions were performed on these 
libraries to generate libraries in the pBluescript phagemid. 
Libraries were generated and excisions were performed 
according to the protocols/methods hereinafter described. 

The polynucleotides of the present invention may be in 
the form of RNA or DNA which DNA includes cDNA, genomic DNA, 
and synthetic DNA. The DNA may be double- stranded or single- 
stranded, and if single stranded may be the coding strand or 
non-coding (anti-sense) strand. The coding sequences which 
encodes the mature enzymes may be identical to the coding 
sequences shown in Figures 1-9 (SEQ ID NOS : 19-27) or may be 
a different coding sequence which coding sequence, as a 
result of the redundancy or degeneracy of the genetic code, 
encodes the same mature enzymes as the DNA of Figures 1-9 
{SEQ ID NOS: 19-27) . 

The polynucleotide which encodes for the mature enzyme 
of Figures 1-9 (SEQ ID NOS: 28-36) may include, but is not 
limited to: only the coding sequence for the mature enzyme ; 
the coding sequence for the mature enzyme and additional 
coding sequence such as a leader sequence or a proprotein 
sequence; the coding sequence for the mature enzyme (and 
optionally additional coding sequence) and non-coding 
sequence, such as introns or non-coding sequence 5' and/or 3' 
of the coding sequence for the mature enzyme . 

Thus, the term "polynucleotide encoding an enzyme 
(protein) encompasses a polynucleotide which includes only 
coding sequence for the enzyme as well as a polynucleotide 
which includes additional coding and/or non-coding sequence. 

The present invention further relates to variants of the 
hereinabove described polynucleotides which encode for 
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fragments, analogs and derivatives of the enzymes having the 
deduced amino acid sequences of Figures 1-9 {SEQ ID NOS : 28- 
36) . The variant of the polynucleotide may be a naturally 
occurring allelic variant of the polynucleotide or a ncn- 
naturally occurring variant of the polynucleotide. 

Thus, the present invention includes polynucleotides 
encoding the same mature enzymes as shown in Figures 1-9 (SEQ 
ID NOS: 19-27) as well as variants of such polynucleotides 
which variants encode for a fragment, derivative or analog of 
the enzymes of Figures 1-9 (SEQ ID NOS: 19-27). Such 
nucleotide variants include deletion variants, substitution 
variants and addition or insertion variants. 

As hereinabove indicated, the polynucleotides may have 
a coding sequence which is a naturally occurring - allelic 
variant of the coding sequences shown in Figures 1-9 (SEQ- ID 
NOS: 19-27), As known in the art, an allelic variant is an 
alternate form of a polynucleotide sequence which may have a 
substitution, deletion or addition of one or more 
nucleotides, which does not substantially alter the function 
of the encoded enzyme. Also, using directed and other 
evolution strategies, one may make very minor changes in DNA 
sequence which can result in major changes in function. 

Fragments of the full length gene of the present 
invention may be used as hybridization probes for a cDNA or 
a genomic library to isolate the full length DNA and to 
isolate other DNAs which have a high sequence similarity to 
the gene or similar biological activity. Probes of this type 
preferably have at least 10, preferably at least 15, and even 
more preferably at least 30 bases and may contain, for 
example, at least 50 or more bases. In fact, probes of this 
type having at least up to 150 bases or greater may be 
preferably utilized. The probe may also be used to identify 
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a DNA clone corresponding co a full length transcript and a 
genomic clone or clones that contain the -.complete gene 
including regulatory and promoter regions, exons and introns. 
An example of a screen comprises isolating the coding region 
of the gene by using the known DNA sequence to synthesize an 
oligonucleotide probe. Labeled oligonucleotides having a 
sequence complementary or identical to that of the gene or 
portion of the gene sequences of the present invention are 
used to screen a library of genomic DNA to determine which 
members of the library the probe hybridizes to. 

It is also appreciated that such probes can be and are 
preferably labeled with an analytically detectable reagent to 
facilitate identification of the probe. Useful reagents 
include but are not limited to radioactivity, fluorescent 
dyes or enzymes capable of catalyzing the formation of a 
detectable product. The probes are thus useful to isolate 
complementary copies of DNA from other sources or to screen 
such sources for related sequences. 

The present invention further relates to 
polynucleotides which hybridize to the hereinabove-described 
sequences if there is at least 70%, preferably at least 90%, 
and more preferably at least 95% identity between the 
sequences. (As indicated above, 70% identity would include 
within such definition a 70 bps fragment taken from a 100 bp 
polynucleotide, for example.) The present invention 
particularly relates to polynucleotides which hybridize under 
stringent conditions to the hereinabove-described 
polynucleotides. As herein used, the term "stringent 
conditions" means hybridization will occur only if there is 
at least 95% and preferably at least 97% identity between the 
sequences. The polynucleotides which hybridize to the 
hereinabove described polynucleotides in a preferred 
embodiment encode enzymes which either retain substantially 
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the same biological function or activity as the mature enzyme 
encoded by the DNA of Figures 1-9 (SEQ ID NOS : 19-27). In 
referring to identity in the case of hybridization, as known 
in the art, .such identity refers to the complementarity of 
two polynucleotide segments. 

Alternatively, the polynucleotide may have at least 15 
bases, preferably at least 3 0 bases, and more preferably ac 
least 50 bases which hybridize to any part of a 
polynucleotide of the present invention and which has an 
identity thereto, as hereinabove described, and which may or 
may not retain activity. For example, such polynucleotides 
may be employed as probes for the polynucleotides of SEQ ID 
NOS: 19-27, for example, for recovery of the polynucleotide 
or as a diagnostic probe or as a PCR primer. 

Thus, the present invention is directed to 
polynucleotides having at least a 70% identity, preferably at 
least 90% identity and more preferably at least a 95% 
identity to a polynucleotide which encodes the enzymes of SEQ 
ID NOS: 28-3 6 as well as fragments thereof, which fragments 
have at least 15 bases, preferably at least 30 bases, more 
preferably at least 50 bases and most preferably fragments 
having up to at least .150 bases or greater, which fragments 
are at least 90% identical, preferably at least 95% identical 
and most preferably at least 97% identical to any portion of 
a polynucleotide of the present invention. 

The present invention further relates to enzymes which 
have the deduced amino acid sequences of Figures 1-9 (SEQ ID 
NOS: 28-3 6) as well as fragments, analogs and derivatives of 
such enzyme. 

The terms "fragment," "derivative" and "analog" when 
referring to the enzymes of Figures 1-9 (SEQ ID NOS. 28-36) 
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means enzymes which retain essentially the same biological 
function or activity as such enzymes. Thus, an analog 
includes a proprotein which can be activated by cleavage of 
the proprotein portion to produce an active mature enzyme. 

The enzymes of the present invention may be a 
recombinant enzyme, a natural enzyme or a synthetic enzyme, 
preferably a recombinant enzyme. 

The fragment, derivative or analog of the enzymes of 
Figures 1-9 (SEQ ID NOS. 28-36) may be (i) one in which one or 
more of the amino acid residues are substituted with a 
conserved or non- conserved amino acid residue (preferably a 
conserved amino acid residue) and such substituted amino acid 
residue may or may not be one encoded by the genetic code, or 
(ii) one in which one or more of the amino acid residues 
includes a substituent group, or (iii) one in which the 
mature enzyme is fused with another compound, such as a 
compound to increase the half-life of the enzyme (for 
example, polyethylene glycol), or (iv) one in which the 
additional amino acids are fused to the mature enzyme, such 
as a leader or secretory sequence or a sequence which is 
employed for purification of the mature enzyme or a 
proprotein sequence. Such fragments, derivatives and analogs 
are deemed to be within the scope of those skilled in the art 
from the teachings herein. 

The enzymes and polynucleotides of the present invention 
are preferably provided in an isolated form, and preferably 
are purified to homogeneity. 

The term "isolated" means that the material is removed 
from its original environment (e.g., the natural environment 
if it is naturally occurring). For example, a naturally- 
occurring polynucleotide or enzyme present in a living animal 
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is not isolated, but the same polynucleotide or enzyme, 
separated from some or all of the coexisting materials in the 
natural system, is isolated. Such polynucleotides could be 
part of a vector and/or such polynucleotides or enzymes could 
be part of a composition, and still be isolated in that such 
vector or composition is not part of its natural environment. 

The enzymes of the present invention include the enzymes 
of SEQ ID NOS; 28-36 (in particular the mature enzyme) as 
well as enzymes which have at least 70% similarity 

(preferably at least 70% identity) to the enzymes of SEQ ID 
NOS: 28-36 and more preferably at least 90% similarity (more 
preferably at least 90% identity) to the enzymes of SEQ ID 
NOS: 28-36 and still more preferably at least 95% similarity 

(still more preferably at least 95% identity) to the enzymes 
of SEQ ID NOS: 28-36 and also include portions "of such 
enzymes with such portion of the enzyme generally containing 
at least 30 amino acids and more preferably at least 50 amino 
acids and most preferably at least up to 150 amino acids. 

As known in the art "similarity" between two enzymes is 
determined by comparing the amino acid sequence and its 
conserved amino acid substitutes of one enzyme to the 
sequence of a second enzyme. The definition of 70% 
similarity would include a 70 amino acid sequence fragment of 
a 100 amino acid sequence, for example, or a 70 amino acid 
sequence obtained by sequentially or randomly deleting 3 0 
amino acids from the 100 amino acid sequence. 

A variant, i.e. a "fragment", "analog" or "derivative" 
polypeptide, and reference polypeptide may differ in amino 
acid sequence by one or more substitutions, additions, 
deletions, fusions and truncations, which may be present in 
any combination. 
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Among preferred variants are those that vary from a 
reference by conservative a-nino acid substitutions. Such 
substitutions are those that substitute a gxven amino acid in 
a polypeptide by another amino acid of like characteristics. 
Typically seen as conservative substitutions are the 
replacements, one for another, among the aliphatic, amino 
acids Ala, Val . Leu and He; interchange of the hydroxyl 
residues Ser and Thr, exchange of the acidic residues Asp and 
Glu substitution between the amide residues Asn and Gin. 
exchange of the basic residues Lys and Arg and replacements 
among the aromatic residues Phe, Tyr. 

Most highly preferred are variants which retain the same 
biological function and activity as the reference polypeptide 
from which it varies. 

Fragments or portions of the enzymes of the present 
invention may be employed for producing the corresponding 
full-length enzyn. by peptide synthesis; therefore, the 
fragments may be employed as intermediates for producing h 
full-length enzymes. Fragments or portions of the 
polynucleotides of the present invention may be used to 
synLsize full-length polynucleotides of the present 



invencion . 
The 



■rne present invention also relates to vectors which 
include polynucleotides of the present invention, ^ost ce Is 
which are genetically engineered with vectors of the 
invention and the production of enzymes of the invention by 
recombinant techniques. 

Host cells are genetically engineered (transduced or 
transformed or transfected) with the vectors of this 
invention which may be, for exan^le, a cloning vector such as 
an expression vector. The vector may be. for example, in the 
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form of a plasmid, a phage, etc. The engineered hose cells 
can be cultured in conventional nutrient media modified as 
appropriate for activating promoters, selecting transf ormants 
or amplifying the genes of the present invention. The 
culture conditions, such as temperature, pH and the like, are 
those previously used with the host cell selected for 
expression, and will be apparent to the ordinarily skilled 
artisan. 

The polynucleotides of the present invention may be 
employed for producing enzymes by recombinant techniques. 
Thus, for example, the polynucleotide may be included in any 
one of a variety of expression vectors for expressing an 
enzyme. Such vectors include chromosomal, nonchromosomal and 
synthetic DNA sequences, e.g., derivatives of SV40; bacterial 
plasmids; phage DNA; baculovirus; yeast plasmids vectors 
derived from combinations of plasmids and phage DNA, viral 
DNA such as vaccinia, adenovirus, fowl pox virus, and 
pseudorabies . However, any other vector may be used as long 
as it is replicable and viable in the host. 

The appropriate DNA sequence may be inserted into the 
vector by a variety of procedures. In general, the DNA 
sequence is inserted into an appropriate restriction 
endonuclease site{s) by procedures known in the art. Such 
procedures and others are deemed to be within the scope of 
those skilled in the art. 

The DNA sequence in the expression vector is operatively 
linked to an appropriate expression control sequencers) 
(promoter) to direct mRNA synthesis. As representative 
examples of such promoters, there may be mentioned: LTR or 
SV40 promoter, the E. coli, lac or trp, the phage lambda 
promoter and other promoters known to control expression of 
genes in prokaryotic or eukaryotic cells or their viruses. 
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The expression vector also contains a ribosome binding sice 
for translation initiation and a transcription terminator. 
The vector may also include appropriate sequences for 
amplifying expression. 

In addition, the expression vectors preferably contain 
one or more selectable marker genes to provide a phenotypic 
trait for selection of transformed host cells such as 
dihydrofolate reductase or neomycin resistance for eukaryotic 
cell culture, or such as tetracycline or ampicillin 
resistance in E. coli. 

The vector containing the appropriate DNA sequence as 
hereinabove described, as well as an appropriate promoter or 
control sequence, may be employed to transform an appropriate 
host to permit Che host to express Che protein. 

AS representative examples of appropriate hosts, there 
may be mentioned: bacterial cells, such as E. col.. 
Streptcmyces, Bacillus subtilis, fungal, cells, such as yeasC; 
insect cells such as Drosophila S2 and SpodopCera 5f9; animal 
cells such as CHO, COS or Bowes melanoma, adenoviruses; plant 
cells etc. The selection of an appropriate host is deemed 
to be'within the scope of those skilled in the art from the 
teachings herein. 

More particularly, the present invention also includes 
recombinant constructs comprising one or more of the 
sequences as broadly described above. The constructs 
comprise a vector, such as a plasmid or viral vector, into 
which a sequence of the invention has been inserced, in a 
forward or reverse orientation. In a preferred aspect of this 
embodiment, the construct further comprises regulatory 
sequences, including, for example, a promoter, operably 
linked to Che sequence. Large numbers of suitable vectors 
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and promocers are known co chose of skill in the art, and are 
commercially available. The following vectors are provided 
by way of example; Bacterial: pQE70, pQESO, pQE-9 (Qiagen) , 
pBluescript II KS, pcrc99a, pKK223-3, pDR540, pRIT2T- 
(Pharmacia); Eukaryotic: pXTl, pSG5 {Stratagene) pSVK3 , pBPV, 
pMSG, pSVL SV40 (Pharmacia) . However, any other plasmid or 
vector may be used as long as they are replicable and viable 
in the host. 

Promoter regions can be selected from any desired gene 
using CAT (chloramphenicol transferase) vectors or other 
vectors with selectable markers. Two appropriate vectors are 
pKK232-8 and pCM7 . Particular named bacterial promoters 
include lad, lacZ, T3 , T7, gpt, lambda Pr, P and trp. 
Eukaryotic promoters include CMV immediate early, HSV 
thymidine kinase, early and late SV40, LTRs from retrovirus, 
and mouse metallothionein- I . Selection of the appropriate 
vector and promoter is well within the level of ordinary 
skill in the art. 

In a further embodiment, the present invention relates 
to host cells containing the above -described constructs. The 
host cell can be a higher eukaryotic cell, such as a 
mammalian cell, or a lower eukaryotic cell, such as a yeast 
cell, or the host cell can be a prokaryotic cell, such as a 
bacterial cell. Introduction of the construct into the host 
cell can be effected by calcium phosphate transf ection, DEAE- 
Dextran mediated transfection, or electroporation (Davis, L., 
Dibner, M. , Battey, I., Basic Methods in Molecular Biology, 
(1986) ) . 

The constructs in host cells can be used in a 
conventional manner to produce the gene product encoded by 
the recombinant sequence. Alternatively, the enzymes of the 
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invencion can be synthetically produced by conventional " 
peptide synthesizers. 

Mature proteins can be expressed in mammalian cells, 
yeast, bacteria, or other cells under the control of 
appropriate promoters. Cell -free translation systems can 
also be employed to produce such proteins using RNAs aerived 
from the DNA constructs of the present invention. 
Appropriate cloning and expression vectors for use with 
prokaryotic and eukaryocic hosts are described by Sambrook et 
al Molecular Cloning: A Laboratory Manual. Second Edition, 
-cold spring Harbor, N.Y. , !1989) , the disclosure of which is 
hereby incorporated by reference. 

Transcription of the DNA encoding the enzymes of the 
present invention by higher eukaryotes is increased by 
inserting an enhancer sequence into the vector. Enhancers 
are cis-acting elements of DNA, usually about from 10 to 300 
bp that act on a promoter to increase its transcription. 
Examples include the SV40 enhancer on the late side of the 
replication origin bp 100 to 270, a cytomegalovirus early 
promoter enhancer, the polyoma enhancer on the late side of 
the replication origin, and adenovirus enhancers. 

Generally, recombinant expression vectors will include 
origins of replication and selectable markers permitting 
transformation of the host cell, e.g., the ampicUUn 

of E coli and S. cerevisiae TRPl gene, and 
resistance gene ot h. co±i auu j. ^ 

a promoter derived from a highly-expressed gene to direct 
transcription of a downstream structural sequence. Such 
promoters can be derived from operons encoding glycolytic 
enzymes such as 3-phosphoglycerate kinase (PGK) , .-factor 
acid phosphatase, or heat shock proteins, among others. The 
heterologous structural sequence is assembled in appropriate 
phase with translation initiation and termination sequences. 



-30- 



wo 97/48416 



PCT/US97/10784 



and preferably, a leader sequence capable of directing 
secretion of translated enzyme. Optionally, the heterologous 
sequence can encode a fusion enzyme including an N- terminal 
identification peptide imparting desired characteristics, - 
e.g., stabilization or simplified purification of expressed 
recombinant product. 

Useful expression vectors for bacterial use are 
constructed by inserting a structural DNA sequence encoding 
a desired protein together with suitable translation 
initiation and termination signals in operable reading phase 
with a functional promoter. The vector will comprise one or 
more phenotypic selectable markers and an origin of 
replication to ensure maintenance of the vector and to, if 
desirable, provide amplification within the host. Suitable 
prokaryotic hosts for transformation include ET. coll, 
Bacillus subtilis. Salmonella typhimurium and various species 
within the genera Pseudomonas, Streptomyces , and 
Staphylococcus, although others may also be employed as a 
matter of choice. 

As a representative but nonlimiting example, useful 
expression vectors for bacterial use can comprise a 
selectable marker and bacterial origin of replication derived 
from commercially available plasmids comprising genetic 
elements of the well known cloning vector pBR322 (ATCC 
37017) . Such commercial vectors include, for example, 
pKK2 23-3 {Pharmacia Fine Chemicals, Uppsala, Sweden) and 
pGEMl (Promega Biotec, Madison,- WI, USA) . These pBR322 
"backbone" sections are combined with an appropriate promoter 
and the structural sequence to be expressed. 

Following transformation of - a suitable host strain and 
growth of the host strain to an appropriate cell density, the 
selected promoter is induced by appropriate means {e.g.. 



wo 97/48416 



PCT/US97/10784 



cemperature shift or chemical induction) and cells are 
cultured for an additional period. 

Cells are typically harvested by centrif ugat ion, 
disrupted by physical or chemical means, and the resulting 
crude extract retained for further purification. 

Microbial cells employed in expression of proteins can 
be disrupted by any convenient method, including freeze -thaw 
cycling, sonication, mechanical disruption, or use of cell 
lysing agents, such methods are well known to those skilled 
in the art . 

Various mammalian cell culture systems can also be 
employed to express recombinant protein. Examples of 
mammalian expression systems include the COS-7 Tines of 
monkey kidney fibroblasts, described by Gluzman, Ceil, 23:175 
(1981), and other cell lines capable of expressing a 
compatible vector, for example, the C127, 3T3, CHO, HeLa and 
BHK cell lines. Mammalian expression vectors will comprise 
an origin of replication, a suitable promoter and enhancer, 
and also any necessary ribosome binding sites, 
polyadenylation site, splice donor and acceptor sites, 
transcriptional termination sequences, and 5' flanking 
nontranscribed sequences. DNA sequences derived from the 
SV40 splice, and polyadenylation sites may be used to provide 
the required nontranscribed genetic elements. 

The enzyme can be recovered and purified from 
recombinant cell cultures by methods including artmonium 
sulfate or ethanol precipitation, acid extraction, anion or 
cation exchange chromatography, phosphocellulose 
chromatography, hydrophobic interaction chromatography, 
affinity chromatography, hydroxyl apatite chromatography and 
lectin chromatography. Protein refolding steps can be used, 
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as necessary, in completing configuration of Che mature 
protein. Finally, high performance liquid chromatography 
(HPLC) can be employed for final purification steps. 

The enzymes of the present invention may be a naturally 
purified product, or a product of chemical synthetic 
procedures, or produced by recombinant techniques from a 
prokaryotic or eukaryotic host (for example, by bacterial, 
yeast, higher plant, insect and mammalian cells in culture) . 
Depending upon the host employed in a recombinant production 
procedure, the enzymes of the present invention may be 
glycosylated or may be non-glycosyiated. Enzymes of the 
invention may or may not also include an initial methionine 
amino acid residue. 

Phosphatases are a group of key enzymes in the" removal 
of phosphate groups from organophosphate ester compounds. 
There are numerous phosphatases, including alkaline 
phosphatases, phosphodiesterases and phytases. 

The general application and definitions of such 
compounds are discussed above under the background of the 
invention section. 

The present invention provides novel phosphatase enzymes 
having enhanced thermostability. Such phosphatases are 
beneficial in enzyme labeling processes and in certain 
recombinant DNA techniques, such as in the dephosphorylation 
of vector DNA prior to insert DNA ligation. The recombinant 
phosphatase enzymes provide the proteins in a format amenable 
to efficient production of pure enzyme, which can be utilized 
in a variety of applications as described herein. 

Antibodies generated against the enzymes corresponding 
to a sequence of the present invention can be obtained by 
direct injection of the enzymes into an animal or by 
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adminiscering Che enzymes to an animal. preferably a 
nonhuman. The ancibody so obtained will Chen bind the 
enzymes itself. In this manner, even a sequence encoding 
only a fragment of the enzymes can be used to generace- 
antibodies binding the whole native enzymes. Such antibodies 
can then be used to isolate the enzyme from cells expressing 
that enzyme. 

For preparation of monoclonal antibodies, any technique 
which provides antibodies produced by continuous cell line 
cultures can be used. Examples include the hybridoma 
technique (Kohler and Milstein, Nature, 25ff:495-497. 1975), 
Che trioma technique, the human B-cell hybridoma technique 

(Kozbor et al., Immunology Today 4:72, 1983). and the EBV- 
hybridoma technique to produce human monoclonal antibodies 

(Cole ec al., in Monoclonal Antibodies and Cancer Therapy. 
Alan R. Liss. Inc., pp. 77-96, 1985). 

Techniques described for the production of single chain 
antibodies (U.S. Patent 4,946.778) can be adapted to produce 
single chain antibodies to inununogenic enzyme products of 
this invention. Also, transgenic mice may be used to express 
humanized antibodies to immunogenic enzyme products of this 
invention. 

Antibodies generated against an enzyme of Che present 
invention may be used in screening for similar enzymes from 
other organisms and samples. Such screening techniques are 
known in Che art, for example, one such screening assay is 
described in Sambrook and Maniatis. Molecular Cloning: A 
Laboratory Manual (2d Ed.), vol. 2:Section 8.49, Cold Spring 
Harbor Laboratory, 1989, which is hereby incorporated by 
reference in its entirety. 
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The presenc invention will be further described with 
reference to the following examples; however, it is to be 
understood that the presenc invention is not limited to such 
examples. All parts or amounts, unless otherwise specified,- 
are by weight . 

In order co facilitate understanding of the following 
examples certain frequently occurring methods and/or terms 
will be described. 

"Plasmids" are designated by a lower case "p" preceded 
and/or followed by capital letters and/or numbers. The 
starting plasmids herein are either commercially available, 
publicly available on an unrestricted basis, or can be 
constructed from available plasmids in accord with published 
procedures. In addition, equivalent plasmids cd those 
described are known in the art and will be apparent to the 
ordinarily skilled artisan. 

"Digestion" of DMA refers to catalytic cleavage of the 
DNA with a restriction enzyme that acts only at certain 
sequences in the DNA. • The various restriction enzymes used 
herein are commercially available and their reaction 
conditions, cof actors and other requirements were used as 
would be known to the ordinarily skilled artisan. For 
analytical purposes, typically 1 fig of plasmid or DNA 
fragment is used with about 2 units of enzyme in about 20 fil 
of buffer solution. For the purpose of isolating DNA 
fragments for plasmid construction, typically 5 to 50 /ig of 
DNA are digested with 20 to 250' units of enzyme in a larger 
volume. Appropriate buffers and substrate amounts for 
particular restriction enzymes are specified by the 
manufacturer. Incubation times of about 1 hour at 37'C are 
ordinarily used, but may vary in accordance with the 
supplier's instructions. After - digestion the reaction is 
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electrophoresed directly on a polyacrylamide gel to isolate 
the desired fragment. 

Size separation of the cleaved fragments is performed- 
using 8 percent polyacrylamide gel described by Goeddel et 
ai., Nucleic Acids Res. , 5:4057 (1980). 

"Oligonucleotides" refers to either a single stranded 
polydeoxynucleotide or two complementary polydeoxynucleot ide 
strands which may be chemically synthesized. Such synthetic 
oligonucleotides have no 5' phosphate and thus will not 
ligate to another oligonucleotide without adding a phosphate 
with an ATP in the presence of a kinase. A synthetic 
oligonucleotide will ligate to a fragment that has not been 
dephosphorylated . 

"Ligation" refers to the process of forming 
phosphodiester bonds between two double stranded nucleic acid 
fragments (Maniatis, T. , et al . . Id., p- 146). Unless 
otherwise provided, ligation may be accomplished using known 
buffers and conditions with 10 units of T4 DNA ligase 
("ligase") per 0.5 ^^g of approximately equimolar amounts of 
the DNA fragments to be ligated. 

Unless otherwise stated, transformation was performed as 
described in Sambrook and Maniatis, Molecular Cloning: A 
Laboratory Manual, Cold Spring Harbor Laboratory, 1989. 



One means for isolating the nucleic acid molecules 
encoding the enzymes of the present invention is to probe a 
gene library with a natural or artificially designed probe 
using art recognized procedures (see, for example: Current 
Protocols in Molecular Biology, Ausubel F.M. et al . (EDS.) 
Green Publishing Company Assoc. and John Wiley Interscience , 
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New York, i989, 1992), It is appreciated to one skilled in 
the art that the polynucleotides of SEQ ID N0S:1-16, or 
fragments thereof (comprising at least 10 or 12 contiguous 
nucleotides), are particularly useful probes. Other 
particularly useful probes for this purpose are fragments 
hybridizable fragments to the sequences of SEQ ID NOS: 19-27 
(i.e., comprising at least 10 or 12 contiguous nucleotides). 



It is also appreciated that such probes can be and are 
preferably labeled with an analytically detectable reagent to 
facilitate identification of the probe. Useful reagents 
include but are not limited to radioactivity, fluorescent 
dyes or enzymes capable of catalyzing the formation of a 
detectable product. The probes are thus useful to isolate 
complementary copies of DNA from other sources or to screen 
such sources for related sequences. 

With respect to nucleic acid sequences which hybridize 
to specific nucleic acid sequences disclosed herein, 
hybridization may be carried out under conditions of reduced 
stringency, medium stringency or even stringent conditions. 
As an example of oligonucleotide hybridization, a polymer 
membrane containing immobilized denatured nucleic acids is 
first prehybridized for 30 minutes at 45<'C in a solution 
consisting of 0.9 M NaCl. 50 mM NaH^PO,, pH 7.0, 5,0 mM 
Na^EDTA, 0.5% SDS, lOX Denhardt's, and 0.5 mg/mL 
polyriboadenylic acid. Approximately 2 X 10^ cpm (specific 
activity 4-9 X 10' cpm/ug) of "P end-labeled oligonucleotide 
probe are then added to the solution. After 12-16 hours of 
incubation, the membrane is washed for 3 0 minutes at room 
temperature in IX SET {150 mM NaCl, 20 mM Tris hydrochloride, 
pH 7.8, 1 mM Na^EDTA) containing 0.5% SDS. followed by a 30 
minute wash in fresh IX SET at Tm -10°C for the oligo- 
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nucleotide probe. The membrane is then exposed to auto- 
radiographic film for detection of hybridization signals. 

Stringent, conditions means hybridization will occur only 
if there is at least 90% identity, preferably 95% identity 
and most preferably at least 97% identity between the 
sequences. See J. Sambrook et al., Molecular Cloning, A 
Laboratory Manual (2d Ed. 19S9) (Cold Spring Harbor 
Laboratory) which is hereby incorporated by reference in its 
entirety. 

"Identity" as the term is used herein, refers to a 
polynucleotide sequence which comprises a percentage of the 
same bases as a reference polynucleotide (SEQ ID NOS:l-16). 
For example, a polynucleotide which is at least 90% identical 
to a reference polynucleotide, has polynucleotide bases which 
are identical in 90% of the bases which make up the reference 
polynucleotide and may have different bases in 10% of the 
bases which comprise that polynucleotide sequence. 

The present invention relates to polynucleotides which 
differ from the reference polynucleotide such that the 
differences are silent changes, for example, the amino acid 
sequence encoded by both polynucleotides is the same. The 
present invention also relates to nucleotide changes which 
result in amino acid substitutions, additions, deletions, 
fusions and truncations in the polypeptide encoded by the 
reference polynucleotide. In a preferred aspect of the 
invention these polypeptides retain the same biological 
action as the polypeptide encoded by the reference 
polynucleotide . 

The polynucleotides of this invention were recovered 
from genomic gene libraries from the organisms listed in 
Table 1. Gene libraries were generated in the Lambda ZAP II 
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cloning vector (Stratagene Cloning Systems) . Mass excisions 
were performed on these libraries to generate libraries in 
the pBluescript phagemid. Libraries were generated and 
excisions were performed according to the protocols/methods- 
hereinafter described. 

The excision libraries were introduced into the E. coli 
strain BW14893 F'kanlA. Expression clones were then 
identified using a high temperature filter assay using 
phosphatase buffer containing 1 mg/ml BCIP (5 'Bromo-4 -chloro- 
3-indolyl phosphate) . Expression clones encoding BCIPases 
were identified and repurified from the following organisms: 
Ammonifex degensli KC4 , Methanococcus igneus KoL5, 
Thermococcus alcaliphilus AED112RA, Thermococcus celer, 
Thermococcus GU5L5, 0C9a, MllTL, Thermococcus CL-2 and 
Aquifex VF-5 . 

Expression clones were identified by use of a high • 
temperature filter assay with either acid phosphatase buffer 
or alkaline phosphatase buffer containing BCIP. Metcalf, et 
ai , , Evidence for two phosphonate degradative pathways in 
Enterobacter Aerogenes, J. Bacteriol., 174:2501-2510 (1992)). 

BCIPase activity was tested as follows: An excision 
library was introduced into the E. Coll strain BW14893 F'kan, 
a pho'pnh*lac' strain. After growth on 100 mm LB plates 
containing 100 iig/ml arrpicillin, 80 ^g/ml methicillin and ImM 
IPTG, colony lifts were performed using Millipore HATF 
membrane filters. The colonies transferred to the filters 
were lysed with chloroform vapor in 150 mm glass petri 
dishes. The filters were transferred to 100 mm glass petri 
dishes containing a piece of Whatman 3MM filter paper 
saturated with either acid phosphatase buffer (see recipe 
below) or alkaline phosphatase buffer (see recipe below) 
containing no BCIP. The dish was placed in the oven at 80- 
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SS-C for 30-45 minutes Co heac inactivate endogenous E. coli " 
phosphatases. The filter bearing lysed colonies were then 
transferred to a lOO mm glass petri dish containing 3MM paper 
saturated with either acid phosphatase buffer or alkaline 
phosphatase buffer containing 1 mg/ml BCIP. The dish was 
placed in the oven at SO-SS-C. 

Alkaline Phosphatase Buffer (referenced in Sambrook, J. 
et al (1989) Molecular Cloning, A Laboratory Manual, p. 
1874) includes 100 mM NaCl, 5 mM MgCl, and 100 mM Tris-HCl (pH 
9 5) Clones expressing phosphatase activity (when the 
alkaline phosphatase buffer was used) were derived from 
libraries derived from the organism identified above. 

Acid Phosphatase Buffer includes 100 mM NaCl, 5 mM MgCl, 
and 100 rM Tris-HCL (pH 6.8) . Clones expressing phosphatase 
activity (when the acid phosphatase buffer was used) were 
derived from the library derived from MllTL. 

'Positives' were observed as blue spots on the filter 
menO^ranes. The following filter rescue technique was used to 
retrieve plasmid from lysed positive colony. 

. Filter Rescue Technique: A pasteur pipette (or glass 
capillary tube) was used to core blue spots on the filter 
membrane. The small filter disk was placed m an Eppendorf 
tube containing 20 ul of deionized water. The Eppendorf tube 
was incubated at 75»C for 5 minutes followed by vortexing to 
elute plasmid DNA off the filter. Plasmid DNA containing DMA 
inserts from Ther^ococcus alcaliphilus AEDII12RA was used to 
transform electrocompetent E. coli DHlOB cells. 
Electrocompetent BW14893 F-kanlA E. coli cells were used for 
transformation of plasmid DNA containing inserts from 
Ar»nonifex degensii KC4, Methanococcus igneus K0L5, and 
Therriococcus GUSL5 . The filter-lift assay was repeated on 
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trans format: ion plates to identify * posit ives , ' The 
transformation plates were returned to 31°C incubator to 
regenerate colonies. 3 ml of LBamp liquid was inoculated 
with repurified positives and incubated at 37°C overnight. 
Plasmid DNA was isolated from these cultures and plasmid 
insert were sequenced. 

In some instances where the plates used for the initial 
colony lifts contained non-confluent colonies, a specific 
colony corresponding to a blue spot on the filter could be 
identified on a regenerated plate and repurified directly, 
instead of using the filter rescue technique. This 
"repurif ication" protocol was used for plasmid DNA containing 
inserts from the following: Aimnonifex degensii KC4 , 
ThermocQccus celer, MllTL, and Aquifex VF-5. 

The filter rescue technique was used for DNA from the 
following organisms: Ammonifex degensii KC4 , Methanococcus 
igneus K0L5 , Thermococcus alcaliphilus AED1112RA, 

Thermococcus CL-2, and 0C9a. 

Phosphatases are a group of key enzymes that remove 
phosphate groups from organophosphate ester compounds . The 
most important phosphatases for commercial purposes are 
alkaline phosphatases, phosphodiesterases, and phytases . 

Alkaline phosphatases have several commercial 
applications, including their use in analytical applications 
as an enzyme label in ELISA immunoassays and enzyme- linked 
gene probes, and their use in research applications for 
removing 5' phosphates in polynucleotides prior to end- 
labeling and for dephosphorylating vectors prior to insert 
ligation {see also Current Protocols in Molecular Biology, 
(John Wiley & Sons) (1995), chapter 3, section 10). 
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Alkaline phosphatase hydrolyzes monophosphate esters, 
releasing inorganic phosphate and the cognate alcohol 
compound. It is non-specific with respect to the alcohol 
moiety, a feature which accounts for the many uses of this 
enzyme. The enzyme has a pH optimum between 9 and 10, 
however, it can also work at neutral pH. (From a study of 
the enzyme industry conducted by Business Communications, 
Co., Inc., 25 Van Zant Street, Norwalk, CT 06855, 1995.) 

Two sources of alkaline phosphatase dominate and compete 
in the market: animal, from bovine and calf intestinal 
-mucosa, and bacterial, from E. coli. Due to the high 
turnover number of calf intestinal phosphatase, it is often 
selected as the label in many enzyme immunoassays. The 
usefulness of calf alkaline phosphatase is limited by its 
inherently low thermal stability, which is even - further 
compromised during the chemical preparation of enzyme: 
antibody conjugates. Bacterial alkaline phosphatase could be 
an attractive alternative to calf alkaline phosphatase due to 
bacterial alkaline phosphatase's extreme thermotolerance at 
temperatures as high as 95-C. (Tomazic-Allen S.J., 

Recombinant bacterial alkaline phosphatase as an 
immunodiagnostic enzyme, Annales de Biologie Ciinigue, 1991, 
49 {5) :287-90) . 

Antibodies generated against the enzymes corresponding 
to a sequence of the present invention can be obtained by 
direct injection of the enzymes into an animal or by 
administering the enzymes to an animal, preferably a 
nonhuman. The antibody so obtained will then bind the 
.enzymes itself. In this manner, even a sequence encoding 
only a fragment of the enzymes can be used to generate 
antibodies binding the whole native enzymes. Such antibodies 
can then be used to isolate the enzyme from cells expressing 
that enzyme. 
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For preparation of monoclonal antibodies, any technique 
which provides antibodies produced by continuous cell line 
cultures can be used. Examples include the hybridoma 
technique (Kphler and Milstein, 1975, Nature, 2 56:495-4 97), 
the trioma technique, the human B-cell hybridoma technique 
(Kozbor et al . , 1983, Immunology Today 4:72), and the EBV- 
hybridoma technique to produce human monoclonal antibodies 
(Cole, et al . , 1985, in Monoclonal Antibodies and Cancer 
Therapy, Alan R. Liss, Inc., pp. 77-96). 

Techniques described for the production of single chain 
antibodies (U.S. Patent 4,946,778) can be adapted to produce 
single chain antibodies to immunogenic enzyme products of 
this invention. Also, transgenic mice may be used to express 
humanized antibodies to immunogenic enzyme products of this 
invention. 

Antibodies, as described above, may be employed as a 
probe to screen a library to identify the above -described 
activities or cross-reactive activities in gene libraries 
generated from the organisms described above or other 
organisms . 
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Example 1 

n,^^^^.i.1 RxT>re 5-°-i^n ^nd Purifirafinn of Alkaline 
Phosphaf agg Rnzvmes 

DNA encoding the enzymes of the present invention, SEQ 
ID N0S:1 through 16, were initially amplified from a 
pBluescript vector containing the DNA by the PCR technique 
using Che primers noted herein. The amplified sequences were 
then inserted into the respective pQE vector listed beneath 
the primer sequences, and the enryme was expressed according 
to the protocols set forth herein. The 5' and 3' 
oligonucleotide primer sequences used for subcloning and 
vectors for the respective genes are as follows: 
Ammonifex degensii KC4 - 3A1A 

5. CCGA GAA TTC ATT AAA GAG GAG AAA TTA ACT ATG GGG GCA GOT CGG AAA AGG 3 ' 
5- CCGA GGA TCC TCA CCG CCC CCT GCG GOT GCG 3' 

Vector : pQET3 

iVfetiianococcus igneus Kol5 - 9A1A 

5. CCGA GAA TTC ATT AAA GAG GAG AAA TTA ACT ATG TTG GAT ATA CTG CTT GTT 3' 

CCGA CGA TCC TTA TTT TTT AAC CAA ATGT TCC 3' 
Vector: pQET3 

TheZTnococcus Alcaliphilus AEDII12RA -ISA 

S. CCGA CAA TTG ATT AAA GAG GAG AAA TTA ACT ATG ATG ATG GAA TTC ACT CGC 3- 
5- CGGA GGA TCC CTA CAG TTC TAA AAG TCT TTT A 3 

Vector : pQET3 

ThexTHOcoccus Celer 25A1A (incorporating Mfel restriction 

site) 

5^ CCGA CAA TTG ATT AAA GAG GAG AAA TTA ACT-ATG AGA ACC CTG ACA ATA AAC 3' 
5' CCGA GGA TCC TTA CAC CCA CAG AAC CCT TAG 3 

Vector pQET3 

Thermococcus GU5L5 - 26A1A 

CCGA GAA TTC ATT AAA GAG GAG AAA TTA ACT ATG AAA GGA AAG TCT CTT GTT 3' 
5' CCGA GGA TCC TCA AGC TTC CTG GAG AAT CAA 3 ' 

Vector pQET3 
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0C9a - 2 7A3A 

5' CCGA GAA TTC ATT AAA GAG GAG AAA TTA ACT ATG CCA AGA AAT ATC GCC GCT 3' 
5' CCGA GGA TCC TTA AGG CTT CTC GAG GTG GGG GTT 3' ( 

Vector PQET3 



Mil TL - 29A1A (incorporating Mfel restriction site) 



5' CCGA CAA TTG ATT AAA GAG GAG AAA TTA ACT ATG TAT AAA TGG ATT ATT GAG GG 3' 
5' CCGA GGA CTA AAC ATA GTC TAA GTA ATT AGC 3' 
Vector PQET3 



Thermococcus CL-2 - 3 0A1A 

5' CCGA GAA TTC ATT AAA GAG GAG AAA TTA ACT ATG AGA ATC CTC CTC ACC AAC 3' 
S' CCGA GGA TCC TCA CAG GCT CAG AAG CCT TTG 3' 

Vector pQET3 



Aquifex VF-5 - 34A1A 

5' CCGA GAA TTC ATT AAA GAG GAG AAA TTA ACT ATG GAA AAC TTA AAA AAG TAC CT 3' 
5* CCGA GGA TCC TCA CCG CCC CCT GCG GGT GCG 3' 
Vector pQET3 

The restriction enzyme sites indicated correspond to the 
restriction enzyme sites on the bacterial expression vector 
indicated for the respective gene (Qiagen, Inc. Chatsworth, 
CA) . The pQE vector encodes antibiotic resistance (Amp'') , a 
bacterial origin of replication (ori) , an IPTG-regulatable 
promoter operator (P/0) , a ribosome binding site (RBS) , a 6- 
His tag and restriction enzyme sites. 

The pQE vector was digested with the restriction enzymes 
indicated. The amplified sequences were ligated into the 
respective pQE vector and inserted in frame with the sequence 
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encoding for the RBS . The naCive stop codon was incorporated 
so the .genes were not fused to the His tag Qf the vector. 
The ligation mixture was then used to transform the E. coli 
strain M15/pREP4 (Qiagen, Inc.) by electroporation. 
M15/pREP4 contains multiple copies of the plasmid pREP4, 
which expresses the lad repressor and also confers kanamycin 
resistance (Kan^) • Transf ormants were identified by their 
ability to grow on LB plates and ampicillin/kanamycin 
resistant colonies were selected. Plasmid DNA was isolated 
and confirmed by restriction analysis. Clones containing the 
desired constructs were grown overnight (0/N) in liquid 
culture in LB media supplemented with both Amp (100 ug/ml) 
and Kan (25 ug/ml) . The 0/N culture was used to inoculate a 
large culture at a ratio of 1:100 to 1:250. The cells were 
grown to an optical density 600 (O.D."«) of between 0.4 and 
0.6. IPTG ("isopropyl-B-D-thiogalacto pyranoside") was then 
added to a final concentration of 1 mM. IPTG induces by 
inactivating the lad repressor, clearing the P/0 leading to 
increased gene expression. Cells were grown an extra 3 to 4 
hours. Cells were then harvested by centrifugation. 

The primer sequences set out above may also be employed 
to isolate the target gene from the deposited material by 
hybridization techniques described above. 
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Example 2 

Isolation of A Selected Clone From the Deposited Genomic 

Clones 

A clone is isolated directly by screening the deposited 
material using the oligonucleotide primers set forth in 
Example 1 for the particular gene desired to be isolated. 
The specific oligonucleotides are synthesized using an 
Applied Biosystems DNA synthesizer. 

The two oligonucleotide primers corresponding to the 
gene of interest are used to amplify the gene from the 
deposited material. A polymerase chain reaction is carried 
out in 25 /xl of reaction mixture with 0.1 ug of the DNA of 
the gene- of interest. The reaction mixture is 1.5-5 mM MgClj, 
0.01% (w/v) gelatin, 20 /iM each of dATP, dCTP, dGTP, -dTTP, 25 
pmol of each primer and 1.25 Unit of Taq polymerase. Thirty 
cycles of PGR {denaturation at 94«*C for 1 min ; ' annealing at 
55°C for 1 min; elongation at 72«C for 1 min) are performed 
with the Perkin-Elmer Cetus 9600 thermal cycler. The 
amplified product is analyzed by agarose gel electrophoresis 
and the DNA band with expected molecular weight is excised 
and purified. The PGR product is verified to be the gene of 
interest by subcloning and sequencing the DNA product. The 
ends of the newly purified genes are nucleotide sequenced to 
identify full length sequences. Complete sequencing of full 
length genes is then performed by Exonuclease III digestion 
or primer walking. 

Numerous modifications and variations of the present 
invention are possible in light of the above teachings and, 
therefore, within the scope of the appended claims, the 
invention may be practiced otherwise than as particularly 
described. 
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SEQUENCE LISTING 

GENERAL INFORMATION: 

(i) APPLICANT: 

RECOMBINANT BIOCATALYSIS . INC. 

(ii) -TITLE OF INVENTION: 

THERMOSTABLE PHOSPHATASES 

(iii) NUMBER OF SEQUENCES: S4 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: FISH & RICHARDSON 

(B) STREET: 4225 EXECUTIVE SQUARE. STE . 1400 

(C) CITY: LA JOLLA 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP: 92037 



(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 3.5 INCH DISKETTE 

(B) COMPUTER: IBM PS/2 

{C) OPERATING SYSTEM: MS-DOS 
(D) SOFTWARE: WORD PERFECT 6.0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: Unas Signed 

(B) FILING DATE: June 19, 1997 

(C) CLASSIFICATION: Unassigned 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

{viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Haile, Lisa A. 

(B) REGISTRATION NUMBER: 38,347 

(C) REFERENCE/DOCKET NUMBER: 09010/015WO1 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 619-676-5070 

(B) TELEFAX: 619-678-5099 
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(2) INFORiMATION FOR SEQ ID NO : 1 ; 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: S2 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 ; 

CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGGGGGCA GGTCCGAAAA GG 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY; LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

CCGAGGATCC TCACCGCCCC CTGCGGGTGC G 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS ' 
(A} LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGTTGGAT ATACTGCTTG TT 52 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 32 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

CCGAGGATCC TTATTTTTTA ACCAAATTTC CC 
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(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH; 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 
CD) TOPOLOGY; LINEAR 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:5: 

CCGACAATTG ATTAAAGAGG AGAAATTAAC TATGATGATG GAATTCACTC GC 

(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 32 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xil SEQUENCE DESCRIPTION: SEQ ID N0:6: 

CGGAGGATCC CTACAGTTCT AAAAGTCTTT TA 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE; NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:7: 

CCGACAATTG ATTAAAGAGG AGAAATTAAC TATGAGAACC CTGACAATAA AC 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 11 NUCLEOTIDES 

(B) TYPE; NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:8: 

CCGAGGATCC TTACACCCAC AGAACCCTTA C 
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12) INFORMATION FOR SEQ ID NO ; 9 : 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH : 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGAAAGGA AAGTCTCTTG TT 

(2} INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS 
(A) LENGTH: 31 NUCLEOTIDES 
fB) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY; LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 10 : 
CCGAGGATCC TCAAGCTTCC TGGAGAATCA A 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: - 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

Cii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGCCAAGA AATATCGCCG CT 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 34 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

CGGAGGATCC TTAAGGCTTC TCGAGGTGGG GGTT 
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(2) INFORMATION FOR SEQ ID N0:13: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
■ (D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 13 : 
CCGACAATTG ATTAAAGAGG AGAAATTAAC TATGTATAAA TGGATTATTG AGGG 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 34 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION; SEQ ID N0:14: 
CCGAGGATCC CTAAACATAG TCTAAGTAAT TAGC 

(2) INFORMATION FOR SEQ ID NO: 15: 

(il SEQUENCE CHARACTERISTICS 

(A) LENGTH: 52 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:15: 
CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGAGAATC CTCCTCACCA AC 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
CCGAGGATCC TCACAGGCTC AGAAGCCTTT G 
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(2} INFORMATION FOR SEQ ID NO: 17: 

(iJ SEQUENCE CHARACTERISTICS 

(A) LENGTH: 54 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNZSS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: GENOMIC DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CCGAGAATTC ATTAAAGAGG AGAAATTAAC TATGGAAAAC TTAAAAAAGT ACCT 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 31 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 
{D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:18: 

CGGAAGATCT TCACACCGCC ACTTCCATAT A 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 783 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

ATG AGG GGG AGC GGA GTG CGG ATA CTT CTC ACC AAC GAT GAC GGC ATC 
TTT GCC GAG GGT CTG GGG GCT CTG CGC AAG ATG CTG GAG CCC GTG GCT 
ACC CTT TAG GTG GTG GCT CCG GAC CGA GAG CGT AGC GCG GCC AGC CAT 
GCT ATC ACC GTT CAC CGC CCC CTG CGG GTG CGG GAG GCG GGT TTT CGC 
AGC CCC AGG CTT AAA GGC TGG GTA GTG GAC GGT ACC CCG GCC GAC TGC 
GTC AAG CTG GGC CTG GAG GTA CTT TTG CCC GAA CGT CCA GAT TTC CTG 
GTT TCG GGC ATA AAC TAC GGG CCC AAC CTG GGT ACC GAC GTA CTT TAC 
TCC GGC ACC GTC TCG GCG GCC ATA GAA GGG GTA ATT AAC GGC ATT CCC 
TCG GTG GCC GTA TCT TTG GCC ACG CGG CGG GAG CCG GAC TAT ACC TGG 
GCG GCC CGG TTC GTC CTG GTC CTG CTG GAG GAA CTG CGA AAA CAC CAA 
CTG CCC CCA GGA ACC CTG CTC AAC GTC AAC GTG CCC GAC GGG GTG CCC 
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CGC GGG GTc"aAG GTG ACC AAA CTG GGA AGC GTA CGC TAC GTC AAC GTG 5 76 

GTA GAC TGC CGC ACC GAC COT CGG GGG AAG GCT TAC TAC TGG ATG GCG 624 

GGA GAA CCA TTG GAG CTG GAC GGC AAC GAC TCC GAA ACC GAC GTC TGG 6 72 

GCG GTG CGA GAA GGC TAT ATT TCC GTA ACA CCG GTC CAG ATC GAC CTT 720 

ACT AAC TAC GGC TTC CTG GAA GAA CTC AAA AAA TGG CGT TTC AAG GAT 

' ATC TTT TCT TCT TAA 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 7S5 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: genomic DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



768 
783 



ATG TTG GAT ATA CTG CTT GTT AAT GAT GAT GGC ATT TAT TCA AAT GGA 48 

TTA ATA GCT TTG AAG GAT GCA TTA TTG GAA AAA TTT AAT GCG AGG ATT 96 

ACT ATT GTA GCC CCA ACA AAT CAG CAG AGT GGT ATT GGT AGG GCA ATA 144 

AGT TTA TTC GAG CCG TTA AGG ATA ACT AAA ACC AAA TTA GCA GAT GGT 192 

TCT TGG GGA TAT GCA GTT TCA GGA ACC CCA ACA GAT TGC GTT ATA TTG 240 

GGC ATT TAT GAG ATA TTA AAG AAG GTA ,CCT GAT GTA GTT ATA TCA GGA 288 

ATA AAC ATT GGA GAA AAC CTT GGG ACT GAA ATA ACA ACT TCT GGA ACG 336 

TTG GGG GCT GCG TTT GAA GGG GCC CAT CAT GGG GCT AAG GCA TTA GCA 384 

TCA TCA CTC CAA GTT ACC TCT GAC CAT CTA AAG TTT AAA GAG GGG GAG 4 32 

ACC CCA ATA GAC TTC ACA GTC CCA GCA AGA ATT ACT GCA AAT GTT GTT 4 80 

GAG AAG ATG TTG GAT TAT GAT TTC CCA TGT GAT GTC GTC AAC TTA AAC 52 8 

ATT CCA GAA GGA GCA ACA GAA AAG ACA CCG ATT GAA ATC ACA AGG TTG 576 

GCA AGG AAA ATG TAT ACA ACA CAC GTT GAG GAA AGA ATA GAT CCA AGA 624 

GGG AGG AGT TAT TAT TGG ATT GAT GGG TAT CCT ATT TTA GAG GAA GAG 672 

GAA GAC ACT GAT GTC TAT GTT GTT AGA AGA AAG GGA CAT ATT TCT CTA 720 

ACC CCA TTA ACA TTA GAC ACA ACA ATT AAA AAT TTA GAG GAA TTT AAG 766 

AAA AAA TAT GAG AGA ATA TTA AAT GAA TGA ^98 
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(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH; 7S5 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: genomic DMA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 



ATG 


ATG 


ATG 


GAA 


TTC 


ACT 


CGC 


GAG 


GGA 


ATA 


AAA 


GCT 


GCT 


GTA 


GAG 


GCA 


48 


CTT 


CAA 


GGG 


TTA 


GGA 


GAG 


ATC 


TAC 


GTA 


GTT 


GCC 


CCA 


ATG 


TTT 


CAA 


AGG 


96 


AGC 


GCA 


AGT 


GGA 


AGG 


GCA 


ATG 


ACC 


ATC 


CAC 


AGA 


CCT 


CTA 


AGG 


GCT 


AAA 


144 


AGA 


ATA 


AGT 


ATG 


AAC 


GGT 


GCA 


AAA 


GCA 


GCC 


TAT 


GCT 


TTG 


GAT 


GGA 


ATG 


192 


CCC 


GTT 






















GGA 


GAT 


TTC 


GAC 


240 


CTT 


GCA 


ATA 


AGT 


GGT 


GTA 


AAC 


TTG 


GGA 


GAA 


AAC 


ATG 


AGC 


ACC 


GAG 


ATA 


288 


ACG 


GTT 


TCC 


GGG 


ACT 


GCA 


AGC 


GCT 


GCA 


ATA 


GAG 


GCT 


GCA 


ACC 


CAA 


GAG 


336 


ATC 


CCA 


AGC 


ATT 


CCC 


ATA 


AGC 


CTG 


GAA 


GTT 


AAT 


AGA 


GAA 


AAA 


CAC 


AAA 


384 


TTT 


GGT 


GAG 


GGC 


GAA 


GAG 


ATT 


GAC 


TTC 


TCA 


GCT 


GCC 


AAG 


TAT 


TTC 


CTA 


432 


AGA 


AAA 


ATC 


GCA 


ACG 


GCG 


GTT 


TTA 


AAG 


AGA 


GGC 


CTC 


CCC- AAA 


GGA 


GTC 


480 


GAT 


ATG 


CTG 


AAC 


GTC 


AAC 


GTC 


CCT 


TAT 


GAT 


GCA 


AAT 


GAA 


AGG 


ACA 


GAG 


528 


ATA 


GOT 


TTT 


ACT 


CGC 


CTG 


GCA 


AGA 


AGG 


ATG 


TAT 


AGG 


CCT 


TCT 


ATT 


GAA 


576 


GAG 


CGC 


ATA 


GAC 


CCA 


AAG 


GGG 


AAT 


CCC 


TAC 


TAC 


TGG 


ATA 


GTT 


GGA 


ACT 


624 


CAG 


TGC 


OCT 


AAG 


GAG 


GCA 


TTA 


GAG 


CCG 


GGA 


ACG 


GAT 


ATG 


TAT 


GTA 


GTT 


672 


AAA 


GTT 


GAG 


AGA 


AAA 


GTT 


AGC 


GTG 


ACT 


CCA 


ATA 


AAC 


ATT 


GAT 


ATG 


ACA 


720 


GCA 


AGA 


GTG 


AAT 


TTA 


GAC 


GAG 


ATT 


AAA 


AGA 


CTT 


TTA 


GAA 


CTG 


TAG 




765 



(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 816 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 
CO STRANDEDNESS: SINGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

ATG AGA ACC CTG ACA ATA AAC ACT GAC GCG GAG GGG TTC GTT TTG AGG 48 
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ATT 


CTC 


CTG 


ACG 


AAC 


GAC 


GAT 


GGA 


ATC 


TAC 


TCC 


AAC 


GGA 


CTG 


CGC 


GCC 


96 


GCT 


GTG 


AAA 


GCC 


CTG 


AGT 


GAG 


CTC 


GGC 


GAA 


GTT 


TAC 


GTC 


GTT 


GCC 


CCC 


144 


CTC 


TTC 


CAG 


AGG 


AGC 


GCG 


AGC 


GGC 


AGG 


GCC 


ATG 


ACG 


CTC 


CAC 


AGG 


CCG 


192 


ATA 


AGG 


GCC 


AAG 


CGC 


GTT 


GAC 


GTT 


CCC 


GGC 


GCA 


AAG 


ATA 


GCC 


TAC 


GGA 


240 


ATA 


GAT 


GGA 


ACT 


CCT 


ACT 


GAC 


TGC 


GTG 


ATT 


TTC 


GCC 


ATA 


GCC 


CGC 


TTC 


2B8 


GGG 


AGC 


TTT 


GGT 


TTA 


GCC 


GTG 


AGC 


GGG 


ATT 


AAC 


CTC 


GGC 


GAG 


AAC 


CTG 


336 


AGC 


ACC 


GAG 


ATA 


ACA 


GTC 


TCA 


GGG 


ACG 


GCC 


TCC 


GCT 


GCC 


ATA 


GAG 


GCC 


384 


TCA 


ACT 


CAT 


GGA 


ATT 


CCG 


AGC 


ATA 


GCG 


ATT 


AGC 


CTT 


GAG 


GTG 


GAG 


TGG 


432 


AAG 


AAG 


ACC 


CTC 


GGC 


GAG 


GGT 


GAG 


GGG 


GTT 


GAC 


TTC 


TCG 


GTC 


TCG 


ACT 


480 


CAC 


TTC 


CTC 


AAG 


AGA 


ATC 


GCG 


GGA 


GCC 


CTC 


TTG 


GAG 


AGA 


GGT 


CTT 


CCT 


. 528 


GAG 


GGC 


GTT 


GAC 


ATG 


CTC 


AAC 


GTC 


AAC 


GTT 


CCG 


AGC 


GAC 


GCG 


ACG 


GAG 


S76 


GAA 




GAG 


ATA 


GCA 


ATC 


ACC 


CGC 


TTA 


GCC 


CGG 


AAG 


CGC 


TAC 


TCC 


CCA 


624 


ACG 


GTC 


GAG 


GAG 


AGG 


ATT 


GAC 


CCC 


AAG 


GGC 


AAC 


CCC 


TAC 


TAC 


TGG 


ATT 


672 


GTC 


GGC 


AAA 


CTT 


GTC 


CAA 


GAC 


TTC 


GAG 


CCA 


GGG 


ACA 


GAT 


GCC 


TAC 


GCC 


720 


CTG 


AAG 


GTC 


GAG 


AGG 


AAG 


GTC 


AGC 


GTC 


ACG 


CCG 


ATA 


AAC 


ATA 


GAT 


ATG 


768 


ACT 


GCG 


AGG 


GTG 


GAC 


TTT 


GAG 


GAG 


CTT 


GTA 


AGG 


GTT 


CTG 


TGG 


GTG 


TAA 


S16 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 1494 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: genomic DNA 

■ (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 



ATG 


AAA 


GGA 


AAG 


TCT 


CTT 


GTT 


AGC 


GGT 


CTG 


TTG 


TTG 


GGT 


CTT 


TTA 


ATT 


48 


TTG 


AGC 


CTG 


ATT 


TCA 


TTC 


CAG 


CCA 


AGC 


TTT 


GCA 


TAC 


TCC 


CCA 


CAC 


GGC 


96 


GGT 


GTC 


AAA 


AAC 


ATC 


ATA 


ATC 


CTG 


GTT 


GGA 


GAC 


GGC 


ATG 


GGT 


CTT 


GGG 


144 


CAT 


GTA 


GAA 


ATT 


ACA AAG 


CTC 


GTT 


TAT 


GGA 


CAC 


TTA 


AAC 


ATG 


GAA AAC 


192 


TTT 


CCA 


GTT 


ACT 


GGA 


TTT 


GAG 


CTT 


ACT 


GAT 


TCC 


CTA 


AGT 


GGT 


GAA 


GTT 


240 


ACA 


GAT 


TCT 


GCT 


GCG 


GCA 


GGA 


ACT 


GCA 


ATA 


TCC 


ACT 


GGA 


GCT 


AAA 


ACG 


288 


TAT 


AAT 


GGT 


ATG 


ATT 


TCA 


GTA 


ACC 


AAC 


ATA 


ACC 


GGA 


AAG 


ATA 


GTT 


AAC 


336 


TTA 


ACA 


ACC 


CTA 


CTT 


GAA 


GTG 


GCT 


CAA 


GAG 


CTT 


GGG 


AAG 


TCA 


ACA 


GGG 


384 


CTG 


GTC 


ACC 


ACA 


ACA 


AGG 


ATT 


ACC 


CAT 


GCA 


ACT 


CCA 


GCA 


GTT 


TTT 


GCG 


432 
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TCC CAT GTC CCA GAT AGG GAT ATG GAG GGG GAG ATA CCC AAG CAA CTC 48 0' 

ATA ATG CAC AAA GTT AAC GTC TTG TTG GGT GGT GGA AGG GAG AAA TTC 528 

GAT GAG AAA AAT TTG GAG CTG GCC AAA AAG CAG GGA TAC AAA GTA GTT 57 6 

TTC ACG AAG GAA GAG CTT GAA AAA GTT GAA GGA GAT TAT GTC CTA GGA 624 

CTC TTT GCA GAA AGT CAC ATC COT TAC GTA TTG GAT AGA AAA CCC GAT 67 2 

GAT GTT GGA CTT TTA GAA ATG GCC AAA AAG GCA ATT TCA ATA CTC GAG 72 C 

AAG AAC CCG AGC GGA TTC TTT CTC ATG GTT GAG . GGC GGA AGG ATT GAC 768 

CAT GCA GCC CAT GGA AAC GAT GTC GCA TCG GTT GTT GCA GAA ACT AAG 816 

GAG TTT GAC GAT GTT GTC AGA TAC GTG CTG GAA TAT CCG AAG AAG AGG 864 

GGA GAT ACC TTG GTA ATA GTG CTT GCC GAT CAC GAA ACT GGA GGT CTT 912 

GCA ATA GGT CTA ACG TAT GGA AAT GCA ATC GAT GAA GAT GCC ATA AGA 960 

AAA ATA AAA GCA AGC ACG TTG AGG ATG CCC AAA GAG GTT AAG GCA GGG 1008 

AGT AGT GTA AAA GAG TCC TCA AAG GTA TGC CGG ATT TGT CCC AAC AGA 10 56 

GGA AGA AGT CAG TAT ATT GAG AAT GCG CTG CAC TCG ACA AAC AAG TAT 1104 

GCC CTC TCA AAT GCA GTA GCC GAT GTT ATA AAC AGG CGT ATT GGT GTT 1152 

GGA TTC ACC TCC TAT GAG CAT ACA GGA GTT CCA GTT CCG CTC TTA GCT 1200 

TAC GGT CCC GGG GCA GAG AAC TTC AGA GGT TTC TTA CAC CAT GTG GAT 1246 

ACA GCA AGA TTA GTT GCA AAG TTA ATG CTC TTT GGA AGG AGG AAT ATT 1296 

CCA GTT ACC ATT TCA AGC GTG AGC AGT GTT AAG GGA GAC ATA ACC GGT 1344 

GAT TAC AGG GTT GAT GAG AAG GAT GCC TAC GTT ACG CTC ATG ATG TTT 1392 

CTC GGA GAA AAA GTG GAT AAT GAA ATT GAA AAG AGA GTC GAT ATA GAC 1440 

AAC AAC GGC ATG GTT GAC TTA AAT GAC GTC ATG TTG ATT CTC CAG GAA 1488 

GCT TGA 1494 



(2) INFORMATION FOR SEQ ID NO: 24: • 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 1755 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: genomic DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

ATG CCA AGA AAT ATC GCC GCT GTA TGC GCC CTG GCC GCT TTG TTA GGG 
TCG GCC TGG GCG GCC AAA GTT GCC GTC TAC CCC TAC GAC GGA GCC GCT 
TTG CTG GCG GGG CAG CGC TTC GAT TTG CGC ATA GAA GCC TCC GAG CTG 
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AAA GGC AAt"tTA AAG OCT TAC CGC ATC ACC CTG GAC GGC CAG CCT CTG 
GCG GGC CTC GAG CAA ACC GCG CAG GGG GCC GGG CAG GCC GAG TGG ACC 
CTG CGC GGT GCC TTC CTG CGC CCT GGA AGC CAC ACC CTC GAG GTC AGC 
CTC ACC GAC GAC GCT GGG GAG AGC AGG AAG AGC GTA CGT TGG GAG GCT 
CGG CAG AAC CTT CGC TTG CCC CGA GCG GCC AAG AAT GTG ATT CTC TTC 
ATT GGC GAC GGG ATG GGC TGG AAC ACC CTC AAC GCC GCC CGC ATC ATC 
GCC AAA GGC TTT AAC CCC GAA AAC GGT ATG CCC AAC GGA AAC CTC GAG 
ATC GAG AGT GGT TAC GGT GGG ATG GCT ACC GTC ACT ACC GGC AGC TTT 
GAT AGC TTC ATC GCC GAC TCA GCT AAC TCG GCT TCT TCC ATC ATG ACC 
GGG CAG AAG GTG CAG GTG AAT GCC CTC AAC GTT TAC CCA TCA AAC CTC 
AAA GAT ACC CTG GCC TAC CCC CGG ATC GAA ACC CTA GCG GAG ATG CTC 
AAG CGG GTA CGC GGG GCC AGC ATT GGG GTA GTG ACC ACC ACC TTC GGC 
ACC GAC GCT ACC CCG GCT TCA CTC AAC GCC CAT ACC CGC CGC CGC GGT 
GAT TAC CAG GCT ATC GCC GAC ATG TAC TTT GGT AGA GGC GGG TTC GGT 
GTT CCC TTG GAT GTG ATG CTC TTC GGT GGT TCA CGC GAC TTC ATC CCC 
CAG AGC ACC CCT GGC TCG CGG CGC AAG GAT AGC ACG GAC TGG ATT GCC 
GAA TCC CAG AAG CTG GGC TAC ACC TTT GTC AGC ACC CGC AGC GAG CTG 
CTG GCG GCC AAA CCC ACC GAT AAG CTG TTT GGG CTG TTC AAC ATT GAC 
AAC TTC CCC AGC TAC CTA GAC CGC GCA GTG TGG AAG CGG CCC GAG ATG 
CTG GGA AGC TTT ACC GAT ATG CCC TAC CTC TGG GAG ATG ACC CAG AAA 
GCC GTG GAG GCT CTC TCC AGA AAC GAC AAA GGC TTT TTC TTG ATG GTT 
GAG GGG GGA ATG GTG GAT AAG TAC GAG CAC CCC TTG GAC TGG CCC CGC 
GCA CTT TGG GAT GTA CTC GAG CTG GAC CGC GCG GTG GCT TGG GCC AAG 
GGC TAT GCG GCC TCC CAC CCC GAT ACC CTG GTG ATT GTC ACC GCC GAC 
CAC GCT CAC TCG ATC TCG GTG TTT GGC GGT TAC GAC TAC TCC AAG CAG 
GGC CGG GAG GGG GTG GGG GTT TAT GAG GCC GCC AAG TTC CCC ACC TAC 
GGC GAC AAA AAA GAC GCC AAC GGC TTT CCC TTG CCC GAC ACC ACT CGG 
GGA ATC GCG GTA GGC TTC GGG GCC ACG CCG GAT TAC TGT GAA ACC TAC 
CGG GGC CGC GAG GTC TAC AAA GAC CCC ACC ATC TCC GAC GGC AAA GGT 
GGT TAC GTG GCC AAC CCT GAG GTC TGC AAG GAG CCG GGC CTT CCA ACG 
TAC CGG CAA CTC CCA GTA GAT AGC GCC CAG GGC GTG CAC ACG GCT GAT lB22 
CCC ATG CCG CTG TTT GCC TTT GGC GTG GGG TCT CAG TTC TTC AAT GGC 



192 ' 
240 
288 
336 
364 
432 
480 
528 
576 
624 
672 
720 
768 
816 
864 
912 
960 
1008 
1056 
1104 
1152 
1200 
1248 
1296 
1344 
1392 
1440 
1488 
1536 
1584 



1680 
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CTC ATC GA'C CAG ACC GAG ATC TTC TTC CGC ATG GCC CAG GCC CTA GGG 1728- 
TTC AAC CCC CAC CTC GAG AAG CCT TAA 1755 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 912 NUCLEOTIDES 

(3) TYPE: NUCLEIC ACID . 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE; genomic DNA 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO:25: 



ATG 


TAT 


AAA 


TGG 


ATT 


ATT 


GAG 


GGT 


AAG 




GCC 


CAA 


GCA 




TTT 


CCA 


48 


AGC 


CTA 


GGT 


GAA 


CTA 


GCC 


GAT 


CTC 


AAA 


AGA 






GAC 


GCC 


ATT 


ATT 


96 


GTT 


CTT 


ACA 


ATG 


CCG 


CAT 


GAA 


CAA 


CCG 




AAT 


GAG 


AAA 


TAT 


ATC 


GAG 


144 


ATA 


TTA 


GAG 


AGC 


CAT 


GGA 




CAA 


GTC 


CTC 


CAT 


GTC 






CTC 


GAC 


192 


TTT 


CAT CCT 


TTA 


GAA 


CTC 


TTC 


GAC 






AAA 


ACA 


AGC 


ATA 


TTC 


ATT 


24 0 


GAT 


GAA 


AAC 


CTG 


GAG 


AGA 


TCC 


CAC 


AGA 


GTG 


CTT 


GTC 


CAC 


TGC 


ATG 


GGA 


298 


GGC 


ATA 


GGC 


CGG 


AGC 


GGG 


CTT 


GTA 


ACT 


GCT 


GCG 


TAC 


TTA 


ATA 


TTC 


AAA 


336 


GOT 


TAT 


GAT 


ATT 


TAC 


GAC 


GCG 


GTA 


AAG 


CAT 


GTG 


AGA 


ACG 


GTA 


GTG 


CCT 


384 


GGT 


GCT 


ATT 


GAA 


AAC 


AGA 


GGG 


CAA 


GCG 


TTA 


ATG 


CTT 


GAG 


AAC 


TAC 


TAT 


432 


ACC 


CTG 


GTC 


AAA 


AGT 


TTC 


AAC 


AGA 


GAG 


TTG 


CTG 


AGA 


GAC 


TAC 


GGG AAG 


480 


AAA 


ATT 


TTC 


ACG 


CTC 


GGT 


GAC 


CCG 


AAG 


GCG 


GTT 


CTC 


CAC 


GCT 


TCT 


AAG 


528 


ACG 


ACT 


CAG 


TTC 


ACG 


ATT 


GAA 


CTC 


TTA 


AGC 


AAC 


TTA 


CAC 


GTC 


AAC 


GAG 


576 


GCG 


TTT 


TCA 


ATC 


AGT 


GCG 


ATG 


GCT 


CAA 


TCA 


CTG 


CTC 


CAC 


TTT 


CAC 


GAC 


624 


GTA 


AAA 


GTC 


CGC 


TCT 


AAA 


CTG 


AAA 


GAA 


GTA 


TTC 


GAA 


AAC 


ATG 


GAA 


TTC 


672 


TCA 


TCC 


GCC 


TCA 


GAG 


GAG 


GTT 


CTG 


TCA 


TTT 


ATT 


CAC 


CTA 


CTC 


GAT 


TTC 


720 


TAT 


CAG 


GAT 


GGC 


AGG 


GTT 


GTT 


TTA 


ACC 


ATT 


TAC 


GAT 


TAT 


CTC 


CCC 


GAT 


768 


AGG 


GTG 


GAT 


TTG 


ATT 


TTA 


TTG 


TGT 


AAG 


TGG 


GGT 


TGT 


GAT 


AAA 


ATA 


GTT 


816 


GAA 


GTC 


TCG 


TCT 


TCA 


GCG 


AAG 


AAA 


ACC 


GTT 


GAG 


AAG 


CTT 


GTA 


GGA 


AGA 


664 


AAG 


GTT 


TCC 


CTA 


TCC 


TGG 


GCT 


AAT 


TAC 


TTA 


GAC 


TAT 


GTT 


TAG 






912 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 774 NUCLEOTIDES 
{Bl TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) ' MOLECULE TYPE: genomic DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



ATG 


AGA 


ATC 


CTC 


CTC 


ACC 


AAC 


GAC 


GAC 


GGC 


ATC 


TAT 


TCC 


AAC 


GGT 


CTG 


48 


CGC 


GCG 


GCG 


GTG AAG 


GGC 


CTG 


AGC 


GAG 


CTC 


GGC 


GAG 


GTC 


TAC 


GTC 


GTC 


96 


GCC 


CCG 


CTC 


TTC 


CAG 


AGG 


AGC 


GCG 


AGC 


GGT 


CGG 


GCG 


ATG 


ACC 


CTA 


CAC 


144 


AGG 


CCG 


ATA 


AGG 


GCA 


AAG 


AGG 


GTT 


GAC 


GTT 


CCC 


GGC 


GCG 


AAG 


ATA 


GCG 


192 


TAT 


GGC 


ATA 


GAC 


GGA 


ACG 


CCG 


ACC 


GAC 


TGC 


GTG 


ATT 


TTT 


GCC 


ATC 


GCC 


240 


CGC 


TTC 


GGC 


GAC 


TTT 


GAT 


CTG 


GCG 


GTC 


AGC 


GGG 


ATA 


AAC 


CTA 


GGC 


GAG 


288 


AAC 


CTG 


AGC 


ACG 


GAG 


ATA 


ACC 


GTC 


TCC 


GGA 


ACG 


GCC 


TCG 


GCG 


GCG 


ATA 


336 


GAG 


GCT 


TCC 


ACC 


CAC 




ATT 


CCA 


AGT 


GTA 


GCT 


ATA 


AGC 


CTC 


GAG 


GTC 


384 


GAG 


TGG 


AAG 


AAG 


ACC 


CTC 




GAG 


GGG 


GAG 


GGT 


ATT 


GAC 


TTC 


TCG 


GTT 


432 


TCA 


GCA 


CAC 


TTC 


CTG 


AGA 


AGG 


ATA 


GCG 


ACG 


GCT 


GTC 


CTT 


AAG 


AAG 


GGC 


480 


CTG 


CCT 


GAA 


GGG 


GTG 


GAC 


ATG 


CTC 


AAC 


GTG 


AAC 


GTC 


CCT 


AGC 


GAC 


GCC 


528 


AGC 


GAG 


GGG 


ACT 


GAG 


ATC 


GCC 


ATA 


ACG 


CGC 


CTC 


GCG 


AGG 


AAG 


CGC 


TAT 


SIS 


TCT 


CCG 


ACG 


ATA 


GAG 


GAG 


AGG 


ATA 


GAC 


CCC 


AAG 


GGC 


AAC 


CCC 


TAC 


TAC 


624 


TGG 


ATC 


GTT 


GGC 


AGG 


CTC 


GTC 


CAG 


GAG 


TTC 


GAG 


CCG 


GGC 


ACG 


GAC 


GCC 


672 


TAG 


GCT 


CTG 


AAA 


GTC 


GAG 


AGA 


AAG 


GTC 


AGC 


GTC 


ACG 


CCC 


ATA 


AAC 


ATC 


720 


GAC 


ATG 


ACT 


GCG 


AGG 


GTT 


GAC 


TTT 


GAG 


AAC 


CTT 


CAA 


AGG 


CTT 


CTG 


AGC 


768 
































774 


CTG 


; TGA 































(2) INFORMATION FOR SEQ ID NO: 27: 

(ij SEQUENCE CHARACTERISTICS 

* 1 A) LENGTH: 795 NUCLEOTIDES 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: genomic DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

ATG GAA AAC TTA AAA AAG TAC CTA GAA GTT GCA AAA ATA GCC GCG CTC 
GCG GGT GGG CAG GTT CTG AAA GAA AAC TTC GGA AAG GTA AAA AAG GAA 
AAC ATA GAG GAA AAA GGG GAA AAG GAC TTT. GTA AGT TAC GTG GAT AAA 
ACT TCA GAG GAA AGG ATA AAG GAG GTG ATA CTC AAG TTC TTT CCC GAT 
CAC GAG GTC GTA GGG GAA GAG ATG GGT GCG GAG GGA AGC GGA AGC GAA 
TAC AGG TGG TTC ATA GAC CCC CTT GAC GGC ACA AAG AAC TAC ATA AAC 



48 
96 
144 
192 
240 
288 
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GGT 


TTT 


CCC 


ATC 


TTT 


GCC 


GTA 


TCA 


GTG 


GGA 


CTT 


GTT 


AAG 


GGA 


GAA 


GAG 


336 " 


CCA 


ATT 


GTG 


GGT 


GCG 


GTT 


TAC 


CTT 


CCT 


TAC 


TTT 


GAC 


AAG 


CTT 


TAC 


TGG 


38-1 




GCT 












TAC 


GTA 






AAG 




ATA 


AAG 


GTA 


4 3 2 












TTA 


AAG 


CAC 


GCC 


GGA 


GTG 


GTT 


TAC 


GGA 


TTT 


CCC 


480 


TCT 


AGO 


AGC 


AGG 


AGG 


GAC 


ATA 


TCT 


ATC 


TAC 


TTG 


AAC 


ATA 




AAG 


GAT 


528 


GTC 


TTT 


TAG 


GAA 


GTT 


GGC 


TCT 


ATG 


AGG 


AGA 


CCC 


GGG 


GCT 


GCT 


GCG 


GTT 


576 


GAG 


CTC 


TGC 


ATG 


GTG 


GCG 


GAA 


GGG 


ATA 




GAC 


GGG 


ATG 


ATG 


GAG 


TTT 


624 


GAA 


ATG 


AAG 


CCG 


TGG 


GAC 


ATA 


ACC 


GCA 


GGG 


CTT 


GTA 


ATA 


CTG 


AAG 


GAA 


672 


GCC 


GGG 


GGC 


GTT 


TAC 


ACA 


CTT 


GTG 


GGA 


GAA 


CCC 


TTC 


GGA 


GTT 


TCG 


GAC 


720 


ATA 


ATT 


GCG 


GGC 


AAC 


AAA 


GCC 


CTC 


CAC 


GAC 


TTT 


ATA 


CTT 


CAG 


GTA 


GCC 


768 


AAA 


AAG 


TAT 


ATG 


GAA 


GTG 


GCG 


GTG 


TGA 
















795 



(21 INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 260 AMINO ACIDS 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

DESCRIPTION: SEQ ID NO: 28: 

Val Arg lie Leu Leu Thr Asn Asp Asp Gly lie 
10 15 

Gly Ala Leu Arg Lys Met Leu Glu Pro Val Ala 
25 30 

Ala Pro Asp Arg Glu Arg Ser Ala Ala Ser His 
40 45 

Arg Pro Leu Arg Val Arg Glu Ala Gly Phe Arg 
55 60 

Gly Trp Val Val Asp Gly Thr Pro Ala Asp Cys 
70 75 80 

Glu Val Leu Leu Pro Glu Arg Pro Asp Phe Leu 
90 95 

Tyr Gly Pro Asn Leu Gly Thr Asp Val Leu Tyr 
105 110 



Ala Ala lie Glu Gly Val lie Asn Gly He Pro 
120 125 

Leu Ala Thr Arg Arg Glu Pro Asp Tyr Thr Trp 
135 140 

Leu Val Leu Leu Glu Glu Leu Arg Lys His Gin 
150 155 160 



txi) SEQUENCE 
Met Arg Gly Ser Gly 



Phe Ala Glu Gly Leu 
20 

Thr Leu Tyr Val Val 
35 

Ala He Thr Val His 
50 

Ser Pro Arg Leu Lys 
65 

Val Lys Leu Gly Leu 
85 

Val Ser Gly He Asn 
100 



Ser Gly Thr Val Ser 
115 

Ser Val Ala Val Ser 
130 

Ala Ala Arg Phe Val 
145 
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Leu Pro 


Pro 


Gly 


Thr 
165 


Leu 


Leu 


Asn 


Val 


Asn 

170 


Val 


Pro 


Asp 


Gly 


Vai 
175 


Pro 


Arg Gly 


Val 


Lys 

lao 


Val 


Thr 


Lys 


Leu 


Gly 
185 


Ser 


Val 


Arg 


Tyr 


Val 
190 


Asn 


Val 


Val Asp 


Cys 
195 


Arg 


Thr 


Asp 


Pro 


Arg 

200 


Gly 


Lys 


Ala 


Tyr 


Tyr 
205 


Trp 


Met 


Ala 


Gly Glu 
210 


Pro 


Leu 


Glu 


Leu 


Asp 
215 


Gly 


Asn 


Asp 


Ser 


Glu 
220 


Thr 


Asp 


Vai 


Trp 


Ala Val 
225 


Arg 


Glu 


Gly 


Tyr 
230 


lie 


Ser 


Val 


Thr 


Pro 
235 


Val 


Gin 


lie 


Asp 


Leu 
240 


Thr Asn 


Tyr 


Gly 


Phe 
245 


Leu 


Glu 


Giu 


Leu 


Lys 
250 


Lys 


Trp Arg 


Phe 


Lys 
255 


Asp 


lie Phe 


Ser 


Ser 
260 



























(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2S5 AMINO ACIDS 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(Xi) SEQUENCE DESCRIPTION: SEQIDNO:29: 

Met Leu ASP He Leu Leu Val Asn Asp Asp Gly He Tyr Ser Asn Gly 



5 



Leu He Ala Leu Lys Asp Ala Leu Leu Glu Lys Phe Asn Ala Arg He 
20 25 JO 

Thr He val Ala Pro Thr Asn Gin Gin Ser Gly He Gly Arg Ala He 
35 40 *5 

Ser Leu Phe Glu Pro Leu Arg He Thr Lys Thr Lys Leu Ala Asp Gly 

50 55 60 

ser Trp Gly Tyr Ala Val Ser Gly Thr Pro Thr Asp Cys Val He Leu 
65 75 

Gly He Tyr Glu He Leu Lys Lys Val Pro Asp Val Val He Ser Gly 
85 90 95 

lie Asn He Gly Glu Asn Leu Gly Thr Glu He Thr Thr Ser Gly Thr 
100 105 

Leu Gly Ala Ala Phe Glu Gly Ala His His Gly Ala Lys Ala Leu Ala 
1X5 120 12b 

ser Ser Leu Gin Val Thr Ser Asp His Leu Lys Phe Lys Glu Gly Glu 
130 135 140 

Thr Pro He Asp Phe Thr Val Pro Ala Arg He Thr Ala Asn Val Val 
145 ISO 155 150 
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Glu Lys Met Leu Asp Tyr Asp Phe 
165 

He Pro Glu Gly Ala Thr Glu Lys 
180 

Ala Arg Lys Met Tyr Thr Thr His 

195 200 

Gly Arg Ser Tyr Tyr Trp He Asp 
210 215 

Glu Asp Thr Asp Val Tyr Val Val 
225 230 

Thr Pro Leu Thr Leu Asp Thr Thr 
245 

Lys Lvs Tyr Glu Arg He Leu Asn 
260 



Pro Cys Asp Val Val Asn Leu Asn 
170 175 

Thr Pro lie Glu He -.Thr Arg Leu 
185 190 

Val Glu Glu Arg He Asp Pro Arg 
205 

Gly Tyr Pro He Leu Glu Glu Glu 
220 

Arg Arg Lys Gly His He Ser Leu 
235 240 

He Lys Asn Leu Glu Glu Phe Lys 
250 255 

Glu 
265 



(2) INFORMATION FOR SEQ ID NO : 30 : 

(i) , SEQUENCE CHARACTERISTICS 

(A) LENGTH: 254 AMINO ACIDS 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 



Met Met Met Glu Phe Thr Arg Glu Gly He Lys Ala Ala Val Glu Ala 
S 10 15 

Leu Gin Gly Leu Gly Glu He Tyr Val Val Ala Pro Met Phe Gin Arg 
20 25 30 

Ser Ala Ser Gly Arg Ala Met Thr He His Arg Pro Leu Arg Ala Lys 
35 ' 40 45 

Arg He Ser Met Asn Gly Ala Lys Ala Ala Tyr Ala Leu Asp Gly Met 
50 55 60 

Pro Val Asp Cys Val He Phe Ala Met Ala Arg Phe Gly Asp Phe Asp 
65 70 75 80 

Leu Ala He Ser Gly Val Asn Leu Gly Glu Asn Met Ser Thr Glu He 
85 90 95 

Thr Val Ser Gly Thr Ala Ser Ala Ala He Glu Ala Ala Thr Gin Glu 
100 105 110 

He Pro Ser He Pro He Ser Leu Glu Val Asn Arg Glu Lys His Lys 
115 120 125 

Phe Gly Glu Gly Glu Glu He Asp Phe Ser Ala Ala Lys Tyr Phe Leu 
130 135 140 



Arg Lys He Ala Thr Ala Val Leu Lys Arg Gly Leu Pro Lys Gly Val 
145 150 155 160 
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\cr. v;,! Asn Val Pro Tyr Asp Ala Asn Glu Arg Thr Giu 
Asp Met Leu Asn Val Asn vax t-Lu l/^ ^ 



165 



He Ala Phe Thr Arg Leu Ala Arg Arg Met Tyr Arg Pro Ser lie Glu 

180 

Glu Arg lie Asp Pro Lys Gly Asn Pro Tyr Tyr Trp lie Val Gly Thr 

195 200 

Gin Cys Pro Lys Glu Ala Leu Glu Pro Gly Thr Asp Met Tyr Val Val 

210 215 
Uys val Glu Arg Lys Val Ser Val Thr Pro lie Asn He Asp Met Thr 



225 



230 

245 



Ala Arg Val Asn Leu Asp Glu He Lys Arg Leu Leu Glu Leu 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 271 AMINO ACIDS 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31: 



Met Arg Thr Leu Thr He Asn Thr Asp Ala Glu Gly Phe Val Leu Arg 

lie Leu Leu Thr Asn Asp Asp Gly lie Tyr Ser Asn Gly Leu Arg Ala 

20 25 
Ala val Lys Ala Leu Ser Glu Leu Gly Glu Val Tyr Val Val Ala Pro 

35 *° 
Leu Phe Gin Arg Ser Ala Ser Gly Arg Ala Met Thr Leu His Arg Pro 
50 

lie Arg Ala Lys Arg Val Asp Val Pro Gly Ala Lys He Ala Tyr Gly 

65 

lie ASP Gly Thr Pro Thr Asp Cys Val lie Phe Ala He Ala^Arg Phe 



65 



Gly ser Phe Gly Leu Ala Val Ser Gly He Asn Leu Gly Glu Asn Leu 

100 



ser Thr Glu He Thr Val Ser Gly Thr Ala Ser Ala Ala He Glu Ala 



115 

Glv He Pro 

135 

14 5 ISO 

His Phe Leu Lys Arg He Ala Gly Ala Leu Leu Glu Arg Gly Leu Pro 



ser Thr His Gly He Pro Ser He Ala He Ser Leu Glu Val Glu Trp 
130 

Lys Lys Thr Leu Gly Glu Gly Glu Gly Val Asp Phe Ser Val Ser Thr 



165 
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Glu Gly Val Asp 
180 

Glu Thr Glu lie 
195 

Thr Val Glu Glu 
210 

Val Gly Lys Leu 
225 

Leu Lys Val Glu 



Thr Ala Arg Val 
260 



Met Leu Asn Val 



Ala lie Thr Arg 
200 

Arg lie Asp Pro 
215 

Val Gin Asp Phe 
230 

Arg Lys Val Ser 
245 

Asp Phe Glu Glu 



Asr. Vai Pro Ser 
185 

Leu Ala Arg Lys 



Lys Gly Asn Pro 
220 

Glu Pro Gly Thr 
235 

Val Thr Pro lie 
250 

Leu Val Arg Val 
265 



Asp Ala Thr Glu 
190 



Arg Tyr Ser Pro 
205 



Tyr Tyr Trp lie 



Asp Ala Tyr Ala 
240 

Asn He Asp Met 
255 

Leu Trp Val 
270 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 4 97 AMINO ACIDS 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 



Met Lys Gly Lys Ser Leu Val Ser Gly Leu Leu Leu Gly Leu Leu He 
5 10 15 

Leu Ser Leu lie Ser Phe Gin Pro Ser Phe Ala Tyr Ser Pro His Gly 
20 25 30 

Gly Val Lys Asn He He He Leu Val Gly Asp Gly Met Gly Leu Gly 
35 40 45 



His Val Glu He Thr Lys Leu Val Tyr Gly His Leu Asn Met Glu Asn 
50 55 60 

Phe Pro Val Thr Gly Phe Glu Leu Thr Asp Ser Leu Ser Gly Glu Val 
65 70 75 80 

Thr Asp Ser Ala Ala Ala Gly Thr Ala He Ser Thr Gly Ala Lys Thr 
85 90 95 

Tyr Asn Gly Met He Ser Val Thr Aan He Thr Gly Lys He Val Asn 
100 105 110 

Leu Thr Thr Leu Leu Glu Val Ala Gin Glu Leu Gly Lys Ser Thr Gly 
115 120 125 

Leu Val Thr Thr Thr Arg He Thr His Ala Thr Pro Ala Val Phe Ala 
130 135 140 

Ser His Val Pro Asp Arg Asp Met Glu Gly Glu He Pro Lys Gin Leu 
145 ISO 155 160 



-65- 



W0 97/48416 PCT/«S97/10784 

lie Mec His'Lys Val Asn Val Leu Leu Gly Gly Gly Arg Glu Lys Phe 

165 

ASP Glu Lys Asn Leu Glu Leu Ala Lys Lys Gin Gly Tyr Lys Val Val 
180 

Phe Thr Lys Glu Glu Leu Glu Lys Val Glu Gly Asp Tyr Val Leu Gly 



195 



200 



Leu Phe Ala Glu Ser His He Pro Tyr Val Leu Asp Arg Lys Pro Asp 

210 215 "° 

Asp val Gly Leu Leu Glu Met Ala Lys Lys Ala He Ser lie Leu Glu 

225 230 ^-^^ 

Lys Asn pro Ser Gly Phe Phe Leu Met Val Glu Gly Gly Arg lie Asp 

245 250 

His Ala Ala His Gly Asa Asp Val Ala Ser Val Val Ala Glu Thr Lys 
260 265 

Glu Phe ASP Asp val Val Arg Tyr Val Leu Glu Tyr Pro Lys Lys Arg 
275 280 2Bb 

Gly Asp. Thr Leu Val He Val Leu Ala Asp His Glu Thr Gly Gly Leu 

290 295 300 

Ala He Gly Leu Thr Tyr Gly Asn Ala He Asp Glu Asp Ala He Arg 
305 . 

Lys He Lys Ala Ser Thr Leu Arg Met Pro Lys Glu Val Lys Ala Gly 
325 330 

ser ser Val Lys Glu Ser Ser Lys val Cys Arg He Cys Pro Asn Arg 



340 



Gly Arg ser Gin Tyr He Glu Asn Ala Leu His Ser Thr Asn Lys Tyr 



35S 



Ala Leu ser Asn Ala Val Ala Asp Val He Asn Arg Arg lie Gly Val 

370 37S 380 

Gly Phe Thr Ser Tyr Glu His Thr Gly Val Pro Val Pro Leu Leu Ala 
38S 390 "5 

Tyr Gly Pro Gly Ala Glu Asn Phe Arg Gly Phe Leu His His Val Asp 



405 



Thr Ala Arg Leu Val Ala Lys Leu Met Leu Phe Gly Arg Arg Asn He 

425 J w 



420 



Pro val Thr He Ser Ser Val Ser Ser Val Lys Gly Asp He Thr Gly 



435 



440 



ASP Tyr Arg Val Asp Glu Lys Asp Ala Tyr. Val Thr Leu Met Met Phe 

450 *55 
Leu Gly Glu Lys Val Asp Asn Glu He Glu Lys Arg Val Asp He Asp 
465 ''■'0 

Asn Asn Gly Met Val Asp Leu Asn Asp Val Met Leu He Leu oln Glu 
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Ala 
497 



(2) 



INFORMATION FOR SEQ ID NO: 33: 



SEQUENCE CHARACTERISTICS 

(A) LENGTH: 584 AMINO ACIDS 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR 



(ii) 



MOLECULE TYPE: 



PROTEIN 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO;33: 



Met Pro Arg Asn He Ala Ala Val Cys Ala Leu Ala Ala Leu Leu Gly 
5 10 15 

Ser Ala Trp Ala Ala Lys Val Ala Val Tyr Pro Tyr Asp Gly Ala Ala 
20 25 30 

Leu Leu Ala Gly Gin Arg Phe Asp Leu Arg lie Glu Ala Ser Glu Leu 
35 40 45 

Lys Gly Asn Leu Lys Ala Tyr Arg He Thr Leu Asp Gly Gin Pro Leu 
50 55 60 

Ala Gly Leu Glu Gin Thr Ala Gin Gly Ala Gly Gin Ala Glu Trp Thr 
€5 70 75 80 

Leu Arg Gly Ala Phe Leu Arg Pro Gly Ser His Thr Leu Glu Val Ser 
85 30 95 

Leu Thr Asp Asp Ala Gly Glu Ser Arg Lys Ser Val Arg Trp Glu Ala 
100 105 110 

Arg Gin Asn Leu Arg Leu Pro Arg Ala Ala Lys Asn Val He Leu Phe 
LIS X20 125 

He Gly Asp Gly Met Gly Trp Asn Thr Leu Asn Ala Ala Arg He He 
130 135 140 

Ala Lys Gly Phe Asn Pro Glu Asn Gly Met Pro Asn Gly Asn Leu Glu 
145 150 ISS 160 

He Glu Ser Gly Tyr Gly Gly Met Ala Thr Val Thr Thr Gly Ser Phe 
165 170 175 

Asp Ser Phe He Ala Asp Ser Ala Asn Ser Ala Ser Ser He Met Thr 
180 185 190 

Gly Gin Lys VaL Gin Val Asn Ala Leu Asn Val Tyr Pro Ser Asn Leu 
195 200 205 

Lys Asp Thr Leu Ala Tyr Pro Arg He Glu Thr Leu Ala Glu Met Leu 
210 215 220 

Lys Arg Val Arg Gly Ala Ser He Gly Val Val Thr Thr Thr Phe Gly 
225 230 235 240 

Thr Asp Ala Thr Pro Ala Ser Leu Asn Ala His Thr Arg Arg Arg Gly 



245 



250 



255 
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ASP Tyr Gin Ala lie Ala Asp Mec Tyr Phe Gly Arg Gly Gly Phe Gly 
260 

val pro Leu Asp Val Met Leu Phe Gly Gly Ser Arg Asp Phe He Pro 

275 



2B0 285 

Gin ser Thr Pro Gly Ser Arg Arg Lys Asp Ser Thr Asp Trp He Ala 
290 300 



Glu ser Glh Lys Leu Gly Tyr Thr Phe Val Ser Thr Arg ser Glu Leu 

Leu Ala Ala Lys Pro Thr Asp Lys Leu Phe Gly Leu Phe Asn lie Asp 
325 J30 

Asn Phe Pro Ser Tyr Leu Asp Arg Ala Val Trp Lys Arg Pro Glu Mec 

340 345 
Leu Gly ser Phe Thr Asp Met Pro Tyr Leu Trp Glu Met Thr Gin Lys 
355 

Ala val Glu Ala Leu Ser Arg Asn Asp Lys Gly Phe Phe Leu Met Val 

375 



370 



Giu Gly Gly Met Val Asp Lys Tyr Glu His Pro Leu A^p Trp Pro Arg 
385 

Ala Leu Trp Asp Val Leu Glu Leu Asp Arg Ala Val Ala Trp Ala I.ys 
405 

Gly Tyr Ala Ala Ser His Pro Asp Thr Leu Val He Val Thr Ala Asp 



420 



His Ala His ser He Ser Val Phe Gly Gly Tyr Asp Tyr Ser Lys Gin 

440 



435 



Gly Arg Glu Gly Val Gly Val Tyr Glu Ala Ala Lys Phe Pro Thr Tyr 

450 4SS 
Gly ASP Lys Lys Asp Ala Asn Gly Phe Pro Leu Pro Asp Thr Thr Arg 
465 ^•'O 

Gly lie Ala Val Gly Phe Gly Ala Thr Pro Asp Tyr Cys Glu Thr Tyr 

^ 485 
Arg Gly Arg Glu Val Tyr Lys Asp Pro Thr He Ser Asp Gly Lys Gly 
soo 

Gly Tyr Val Ala Asn Pro Glu Val Cys Lys Glu Pro Gly Leu Pro Thr 



515 



Tyr Arg Gin Leu Pro Val Asp Ser Ala Gin Gly Val His Thr Ala Asp 

530 535 
Pro Ket pro Leu Phe Ala Phe Gly Val Gly Ser Gin Phe Phe Asn Gly 

Leu lie Asp Gin Thr Glu He Phe Phe Arg Met Ala Gin Ala Leu Gly 



565 



Phe Asn Pro His Leu Glu Lys Pro 
580 
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INFORMATION FOR SEQ ID NO: 34: 



(i) 



SEQUENCE CHARACTERISTICS 

(A) LENGTH: 301 AMINO ACIDS 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR 



(ii) 



MOLECULE TYPE: 



PROTEIN 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34; 



Met Tyr Lys Trp He He Glu Gly Lys Leu Ala Gin Ala Pro Phe Pro 
5 10 15 

Ser Leu Gly Glu Leu Ala Asp Leu Lys Arg Leu Phe Asp Ala He He 
20 25 30 

Val Leu Thr Met Pro His Glu Gin Pro Leu Asn Glu Lys Tyr lie Glu 
35 40 45 

He Leu Glu Ser His Gly Phe Gin Val Leu His Val Pro Thr Leu Asp 
50 55 60 

Phe His Pro Leu Glu Leu Phe Asp Leu Leu Lvs Thr Ser He Phe He 
65 70 75 80 

Asp Glu Asn Leu Glu Arg Ser His Arg Val Leu Val His Cys Met Gly 
85 90 95 

Gly He Gly Arg Ser Gly Leu Val Thr Ala Ala Tyr Leu He Phe Lys 
100 105 110 

Gly Tyr Asp He Tyr Asp Ala Val Lys His Val Arg Thr Val Val Pro 
115 120 125 

Gly Ala He Glu Asn Arg Gly Gin Ala Leu Met Leu Glu Asn Tyr Tyr 
130 135 140 

Thr Leu Val Lys Ser Phe Asn Arg Glu Leu Leu Arg Asp Tyr Gly Lys 
145 150 155 160 

Lys He Phe Thr Leu Gly Asp Pro Lys Ala Val Leu His Ala Ser Lys 
165 170 175 

Thr Thr Gin Phe Thr He Glu Leu Leu Ser Asn Leu His Val Asn Glu 
180 185 190 

Ala Phe Ser He Ser Ala Met Ala Gin Ser Leu Leu His Phe His Asp 
195 200 205 

Val Lys Val Arg Ser Lys Leu Lys Glu Val Phe Glu Asn Met Glu Phe 
210 215 220 

Ser Ser Ala Ser Glu Glu Val Leu Ser Phe He His Leu Leu Asp Phe 
225 230 235 240 

Tyr Gin Asp Gly Arg Val Val Leu Thr He Tyr Asp Tyr Leu Pro Asp 
245 250 255 

Arg Val Asp Leu He Leu Leu Cys Lys Trp Gly Cys Asp Lys He Val 
260 265 270 

Glu Val Ser Ser Ser Ala Lys Lys Thr Val Glu Lys Leu Val Gly Arg 
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Lys Val Ser Leu Se. Trp Ala Asn Tyr Leu Asp Tyr Val 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 57 AMINO ACIDS 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Met Arg He Leu Leu Thr Asn Asp Asp Gly He Tyr Ser As. Gly Leu 



5 



Arg Ala Ala Val Lys Gly Leu Ser Glu Leu Gly Glu Val Tyr Val Val 



20 



Ala pro Leu Phe Gin Arg Ser Ala Ser Gly Arg Ala Met Thr Leu His 

35 *0 
Arg pro He Arg Ala Lys Arg Val Asp Val Pro Gly Ala Ly. He Ala 

50 55 
Tyr Gly He Asp Gly Thr Pro Thr Asp cys val He Phe Ala He Ala 

Arg Phe Gly Asp Phe Asp Leu Ala Val Ser Gly He Asn Leu^Gly Glu 
85 

Asn Leu ser Thr Glu He Thr Val Ser Gly Thr Ala Ser Ala Ala He 

xoo 

Glu Ala ser Thr His Gly He Pro Ser Val Ala He Ser Leu Glu Val 

115 

Glu Trp Lys Lys Thr Leu Gly Glu Gly Glu Gly He Asp Phe Ser Val 

ser Ala His Phe Leu Arg Arg He Ala Thr Ala V.l Leu Lys Lys Gly 
145 

Leu pro Glu Gly Val Asp Met Leu Asn Val Asn Val Pro Ser Asp Ala 



165 



ser Glu Gly Thr Glu He Ala He Thr Arg Leu Ala Arg Lys Arg Tyr 



180 



ser pro Thr He Glu Glu Arg He Asp Pro Lys Gly Asn Pro Tyr Tyr 
195 

Trp He val Gly Arg Leu Val Gin Glu Phe Glu Pro Gly Thr Asp Ala 

210 

Tyr Ala Leu Lys Val Glu Arg Lys Val Ser Val Thr Pro He Asn He 
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Asp Met Thr Ala Arg Val Asp Phe Glu Asn Leu Gin Arg Leu Leu Ser 
245 250 ' 255 



(2) INFORMATION FOR SEQ ID NO: 36; 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 264 AMINO ACIDS 

(B) TYPE: AMINO ACID 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 



Met Giu Asn Leu Lys Lys Tyr Leu Glu Val Ala Lys He Ala Ala Leu 
5 10 15 

Ala Gly Gly Gin Val Leu Lys Glu Asn Phe Gly Lys Val Lys Lys Glu 
20 25 30 

Asn He Glu Glu Lys Gly Glu Lys Asp Phe Val Ser Tyr Val Asp Lys 
35 40 45 

Thr Ser Glu Glu Arg He Lys Glu Val He Leu Lys Phe Phe Pro Asp 
50 55 60 

His Glu Val Val Gly Glu Glu Met Gly Ala Glu Gly Ser Gly Ser Glu 
€5 70 75 80 

Tyr Arg Trp Phe He Asp Pro Leu Asp Gly Thr Lys Asn Tyr He Asn 
85 90 95 

Gly Phe Pro He Phe Ala Val Ser Val Gly Leu Val Lys Gly Glu Glu 
100 105 110 

Pro He Val Gly Ala Val Tyr Leu Pro Tyr Phe Asp Lys Leu Tyr Trp 
115 120 125 

Gly Ala Lys Gly Leu Gly Ala Tyr Val Asn Gly Lys Arg He Lys Val 
130 135 140 

Lys Asp Asn Glu Ser Leu Lys His Ala Gly Val Val Tyr Gly Phe Pro 
145 150 155 160 

Ser Arg Ser Arg Arg Asp He Ser He Tyr Leu Asn He Phe Lys Asp 
165 170 175 

Val Phe Tyr Glu Val Gly Ser Met Arg Arg Pro Gly Ala Ala Ala Val 
180 185 190 

Asp Leu Cys Met Val Ala Glu Gly He Phe Asp Gly Met Met Glu Phe 
195 200 205 

Glu Met Lys Pro Trp Asp He Thr Ala Gly Leu Val He Leu Lya Glu 
210 215 220 

Ala Gly Gly Val Tyr Thr Leu Val Gly Glu Pro Phe Gly Val Ser Asp 
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Leu His Asp Phe He Leu Gin Vai Ala 
250 



Lvs Lvs Tyr Met Glu Val Ala Val 
260' 
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Pyrolobus Cumarius lA (lph7( 
SEO ID NO:37 

1 TCC CCG AGC GTG TTG CCA AG A TGC TTG AAA GAA TGC TAT CCA AGG CGG AAT CTA TGC TCG 

6 0 

61 GCG ACC CCC AGA GCC TTA TCC AGG AGG CTA AGG CCG TTG AGG CTA AGA AGC TCT TAG CGG 

120 

121 CTG CTC ATA GGC TAG TAG ATC GCC TAG AGG ATC CTC TCG ACC ACG CCC TCA ACC ATA TAG 

180 

161 AGC ATC ACA AGG AAC ATC ATG AGC ACC ACC ACA AGG AGC ACG ACT AAC AAC ACT CTT AGA 

240 

241 ATC TCC ACA CGA OCT TGC TTC CCG TGT CTC TCG CGC CTA CCC ACT TTT TAA TAG CCT AAG 

300 

3 01 CCG AGA CCC ACA TTC CAA CAT TAC TCC CTT TGT CAC TAT CAT CTT CTA ATT CTC ACA CGC 

360 

361 CCC CTA TAA ATT GGC GGA CCT CGA CGA AGC GTT GCC GGT GAC CCC CCG TGC CCA AGA AGG 

420 

421 CTG TCT GCC CAA TAT GCG GTG GCG ATG TTG AAC TAC CCG ATA ACG TAA TGG ATG GCG AGA 

4fi0 

4fll TCC TGG ACC ACG ACT CTG GCG CAA TGC TAG TCG TCA GGA TCC CGG ATG CCA ATG TTG TTC 

540 

541 TAG AGC ACT TGG AGC GCG TTG AGG ACG ACT CGG GAC ACT AGA GGC TAT GCC CAT ACC AAT 

600 

601 CCT TTA TGA CCA TCC GCG TGT TCA GGA GAA GAC GTT AGC TGA GGA ACC GAG GAA CCT TGC 

660 

661 TCA CGA ACC TGT CCT CTT TAA TAT TGA CTC CTT CCT CTT TCC CCT TGA TAG CCT GGA CCG 

720 

721 CAT TCT AGG CGA TGT TCA TGT ACT ACT TCA GAG GGC GGT GAG TTA CTT CAA CGC TCT CGA 

780 

781 GTC TAC AAG GAT ACT CGA GGC TGC CGG CTA CAC TCT CAT CAA CAA TAG TTT ACT GCA CCT 

840 

841 TAA CTG CGG CGA CAA ACT ATT GAC AAC GAT CTT GCT TGC TAA CCA TGG TGT GCC AAC ACC 

900 

901 GCC TGC ATA CGC TGC TTT TTC GCG TGA CAC TGC TCT GCG CCC TGC AGA GGA CCT TGG ATA 

960 

961 CCC CGT TGT TGT CAA GCC CGT CAT TGG TAG TTG CCG TAG GCT TCT CGC TAG GGC TCA TTC 

1020 

1021 CAG GGA GAG TCT AGA GGC TGT GAT ACA GCA TAG AGA GGT TCT CGG CCC GGC TTA CTA CAA 
lOflO 

1081 GGT TCA TTA TGT GCA AGA CTA TGT GCG CAA GCC TCT ACG TGA CAT ACG CGT ATT CGT GAT 
1140 



-73- 



PCT/US97/10784 

WO 97/48416 



H41 
12O0 



1260 



13B0 



1440 



LS60 



15G1 
1630 



1£21 

icao 



;o 

1320 

U:i A== CCT CCT C=T ^ C<=. CAT T« C=C =AA CTC GC* CTT C« 0« C.C TO. 0.0 OOT 

0 

0.C coo OTT T«V «T OOC T.O OOC TAT COT COA OTA TOC AOT OTC =0T COC OAA OAO OTC 

0 

AAT OOA ATG OAT A= OTA OAC OTO CTT CTO OAT OAO OCT AOO COT OOC OCT ATA OAO OOT 

1500 

OAC OCT COC COC OCA TOT OAA CCO OCA TTA AOO CTO OTT OAC OTT OTO CTC COC OAO 000 

CCT AOO OTT OCA CAO OAC TCT 000 COT OOC ATT OAA CCC OOT OAT OTA CTA CTA OCT OAO 

OCT CTO AOC TTO AOA OCA OAO CAO OTO AAO OAO OAO CCC AAO" OCO OAC AAT TOT CTO OAO 

U.. CTC OCA AAO OCT OCA TTC COC CTC TAT AAO COO CTC CAO COO ATO CAO TAA ACT TCO CAO 
1740 

TOT Orr OCC cot TTT ACC CTC TCC CTT act TTC TAC TCO COT CAO CCC ACT CTC CCT TCA 

l&OO 

..0. CAC OTT OCT OOC CCC AOC TCA OAA ACO ACC TCC ACA TOA TAC CCO ACA TCC TCO ACA ACC 

1860 

AOA TCC ACO ACA COA TAC TCC COC AOC OTC TTC CCO AOC AAC OAC TTC TOT TCA TTC OCA 

1920 

OCC OTC ATT err TCC COC CCO CAC TTC TAC CCC AOC ATO CCC OCA TAC CCO TCO CAC CCC 

1980 

ATC CTC TTO ATC TCC TAC TCC CTC CCO TTC ATC OCC CTC CCC ACO CTA TAC TCC TAA OCC 

2040 

TTO CTO CCC OCT CAA AAC OAO TTO TTC ACC CCC CTC CTT TCC TOT CTT CAC OTO CCT TTC 

2100 

,.0X OTA TCA TAO CCO TCA CCC CTA ACC ACA OCA OTC CTC TCC CAC OCA CAC CAC ACO TTA CCC 

2160 

TCA ACC TCC TCT ATT CTC ACC TCC CCT CTO OCA TCO OCO CCC CAC CCC ATC TCO CTA TOC 

2220 

TTC CAC CCC TCT CCO CAT TCT TCA ACO CTA OAC CTC CTA TAC CCO AOA ACC TTO TTC ACC 

2280 

,,.1 ACC CCC TOC err TCO ACC CTC ACC CTO tot A=0 COC OTC TOC OCC TTO CTO TAC CCT CTO 
2340 

-74- 
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2 3-*! CCC'TGT TCA TGC TOT TGA AGA TCT GCG AGT TGC TCG CAC ACT GCG CCA CCT GGT CCC ATC 
2400 

2fl01 TAG AGC AGT TCG CAC ACG CAC CTG TCT ATG GCA CGA GAA GCA ATA TAG TCG TCG TGT ATC 
24$0 

24fil CGA TCC TCG TTG TGA GAG GAG CAC GCT AGA GGA GTA TCT CTC GGC CTT CCG GGA CCC CGC 
2S20 

2521 GTT TGA GGT CAC CAC TGT ACC CGT CTT GAA CGA CCC TTG GTC TAC AGC TAT TCT CCA CCC 
2580 

2 sax TAC GCT GGC CAT CTC CAG TGC TGC AGA GAC CGC CTT CAG TCG CGG CAT TGA GCA GCC GCG 
2640 

2 641 ATA TCG TGC ACA TCC CGC GCT TAG CAG GCT AAC CAG GCT GAT CTA CCT AGA GGA GTA GAA 
2700 

2 701 CCT CTC GAG GAC CGG TAT GTA GTG GTC TAG AGG CTT CCC GTC ATG GTG TAT CGC GAG GCC 
2760 

27fil TAT TCC TGC TCT CCT CGC GCC TTC CAC CTT GGC CTC ATA ATC ATC TAT GAA TCC TGT TTT 
282Q 

2821 CGC TGC GTC CGC GCG AAG CAG TTG CAT CCC CGC CTC GTA TAT CTT TGT CTC TGG CTT GCA 
2880 

2B81 AAA GCC GAC AAT ATC CCT CGT AAC CAC CCT ATC CAC GAG CTG GCC TAG ATC GTC ACG CTC 
2940 

29 41 TAG AAG TAC ACG TAC GCA TTC GTA GCA CCA GTT GTT CGA GAC TAT GCC GAC CAG TAT CCC 
3000 

3001 GTT TCT CTT GGC CCA TCT TAG CAG CTC GTA TGT ACC CGG TGC TAC GTA TAC CCC ACA CAG 
ZQ6Q 

3061 CAC ACC TGA TTG CAA TAC CCT TGC TAA TGC CTC TGC CCT TGA GGC GGT CGG CGT CAA GCC 
3120 

3121 GTG TTT TGC CAG GAG CAC GGC AGC CGC ATA CAC TAT ACT TTG TTC CAC GGA GAC ATC CAC 
31B0 

3181 CCT CCA CGT GTC CAT TAC ACG CCT CAC CCT ATC CGG CGT CGC GTC GGC CCC TAG GGC ACC 
3240 

3241 TAG ATG TCT GGC ACC ACT CTC GTA CAG ACT CTC CTC GTA CCA CTC ATT TGT GAC GTA AAT 
3300 

33 01 GAC GCC ACC TAA. ATC CAG CAC CAG TGT ACG GTT ACG CGG CAA GGC GCC TCC TCA TGT ATT 
3360 

3361 CGA GGA GGC CGC CCG TTG CCA GAA TTT CAG CTA CAA CAC CCC GGA AGG GCG GGA AAC GCT 
3420 

3421 ACG TCA ACA CCC TAC CAT CCT TCT TGA TGA CCT TCC CTA CAC CCT CCT CAA GCT TTA TCT 
3480 

3481 CTA TCT CGT CGC CCT CCT CGG CCC CCT CCA CGA CCT CTC GGA GCA CTA TAA CGG GGA GCC 
3540 
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,541 CGT TOT TAA TCO CGT TAC COT AG^ ATA TTC TCG ACA AGC TCT TCG CIA TCA TCC GCT TGA 

2600 

CGC CTC CAG CCT TCA GAG CTA TCC CCG CTT GCT CCC TCC TAC TAC CCA TAC CAA AGT TCC 

3£fiO 

3»1 TAC CCG CGA CCA GCA CTA CAC CCT TGG ACG CCT TCT TGG OGA ACT CCG GAT CCA GAG GCT 

3720 

3721 CCA TAG CAT GCT CGG CAA GCT TCT CCC GCT CAG TAT ATA CCA GGT AGC GGC CAG GGA TAA 
3780 

3781 TCA CGT CGG TCT TGA TCT TAT TGC CGT AAT TCA GCA CAG GGC CCT TCA CGA CAC CCA GGT 
3640 

3841 TCA AGA GAG GTT CAC CAC AAG TTT GGC CTC GCT ATC CCA CGC TAT AAT CCA CCT GTT TAC 
3900 

3901 TCG GCC AGC TTC ACC CAC ACA CTT TTC AAC TCC ATT ATC CTT GTA GCG CAA TCT ACC CTT 
3960 

39S1 CTG GGT AGC ACA GCC TTA AGC CCA TAG TGC CAA GGC CCC ACA ATG ATG CCC TCC GGC ACA 
4020 

4021 TTC TCG TCG GGT ATC AGC CGG ACG CCT ATG GCC CCT CTC TCC GTC TCG AGC CTA OCG TGA 
4080 

,081 CCC GCC CCA GCC TCC TTA GGG TTG ACT COT GCG TAT AGC TCG CCG CTC ACA TCT AGC ATC 
4140 

4141 GCG TTT GTA CAG TAG CTC ACC GGG TCT CTT GCA GTC ACG ACC ACC TTC CTA TCA CCA TCG 
4300 

4,01 GGC ACG ACC CGC TCO ACC GGC CCG TAT ACA CGG ACC COT ATC CTC CAC ACA CCC CTG GOC 
42CD 

42S1 AGG AGG TAC TCG CCT CTC TCC GCA ACC GCC TTG GAG GAA 4299 
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Thermococcus 9N-2 Olphl) 

SEO 10 NO:39 

1 7GC ACT GAT AAA GAA AAA GAA GAG GTT TAA GGC CCT CAA TAT TAA ATT CTA CAC ATT AGA 

SO 

6L TAT CCA AAA TGG AGA ATT ACT TAA TCT AGA GAC TTA CCT TAA CGA GTT ACA TGA GTT CCT 

120 

121 TAG AGC CCT TAC ATT AAA ACG AAA ACT AGA AGA CGA ACA ATG ACC CCC GAA GAG CTC CTA 

lao 

161 ACC CCC CTC GAA TTC AAA GGA GTA ACC CTC GAA AAG ATG CTC AAT ACT GCG TTA GAG CTC 

340 

241 TAC ATC GGC GAC GAG CGC GAG AAA GTT CCA GAA AGG CTG AGA GAG CTG ATG CTG AGG TAT 

300 

301 CTG GGC GAC ATC AAC GTT CAA GCT CTG CTC TTT TCG GCT CTA CTG CTC GAA GAG AAC TTC 

360 

361 AAG GTT GAG GGC GAC CCC GTG AAC CTT GTG GCC GAC GAG CTC ATC GGC ATG AAC ATC GCC 

420 

421 GAG CTC ATA GGT GGA AAG ATG GCG CTC TTC AAC TTC TTC TAC TAC GAC ACC AAG AAG CCC 

480 

4 81 CGC ATT TTA GCC GAG CTT CCG CCT TTC CTC CAC GAT GCG ATA GGG GGC TTT ATA GCG GGC 

540 

S41 TGT ATG ACA AGG CTG TTC GAG GGG CTG TAC GGT GCG CAA TCT CTT ACC CTT CTT CAC GCG 

600 

601 GAT TCC GGT CAA ACC CAA CTT CAA AAG CCT TAG AAA TGA GCT CTG GGC ACT TCC CAT TCT 

660 

661 CGC ACC GGT AAC TTC GGC CCT GGC GAC CCT CCT GGG CTC TCT GCT CGC CGC CCT AAT AAT 

720 

721 CCT GGG CGG CAA CTA CCC GTT TGA CCC AAC GTC TCG CCA ACC CAC CTC CTG ATA ACC CTC 

7B0 

761 ATA GGC TTC GTC GTG GTC TAC AGC ATA CTG TTC TAC ATC TGG CTC CAC TTC GTC AGG AAG 

840 

841 CTC ATC AGG GAC GGC CCC GAA CCG GTT GAG GGT CAC "GTC ACC CCG AAC CCC ACC CCT GCC 

900 

901 GTT AGC GCC GCG GGA GGT GCT CAC TGA TGG ACT ACG CGA CCG CAT GGT TTT ACT TCT CCG 

960 

961 CCT TCC TCC TCG CAA TGT ACT TAG CCT TTG ATG GCT TCG ACC TTC CCA TAG GCG CCT TCC 

1020 

1021 TCG CCC TGA TTA AGG ACC AGA GGG AGC GCG ACA TAC TCG TGA ACA CCA TCG CGC CGG TCT 
1080 

108 1 GGG ACG CCA ACG AGG TCT GGT TCA TCA CCT GGG GTG CCG GGC TCT TCC CGA TGT GGC CCG 
1140 
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1141 
1200 



126 
1320 



1.01 TCA TAT TCA COC CTO TCC OCT TT= ACT TCA CCA ACA ACA ACA ACO AGC TAT OCC ACA AOC 

0 

1 TCT TC3 CTC TCO TCA CCC CCT TAA TCC CCC TCC TCA TCC GCG TCA TAC TCG GCA ACC TCA 
1380 

1,81 CCr ACC CCC TCA TCG TCC GCC TCT TCA TAC TCT TCC CGG TGA CCT GGC ACG GAG CCA ACT 
1440 

M41 GGG GCG TCT ACA AAA CCA CAG GAA AGC TCC AGG AGC AGA TGA GGG AGC TCG CCTT TCA AGG 
1500 

1501 CCT GGC TCC -^A CCG TCG TCT TCC TCC TGC TCA CAG TCA TCG GCA TGA AAA TCT GGG CCC 
1560 

15« CAC TCA GCT TCG AGA GGG CAC TAA CGC CGC TTG GGC TCC TCC TAA CGG TTG TCA TCC TCG 

1620 

U21 TGO CAG GAC TGC TCG ACC GAC AGC TCA TCA AGA AAG GGG AGG AGA ATT TGG CCT TCT ACA 
1680 

»,1 TCA GCT GGC TGG CCT TCC CGC TCG TTG TGT TCC TCG TCT ACT ACA CAA TGT ACC CCT ACT 
1740 

,7,1 GGG TCA TCr CGA CCA CCG ATC CGA ACT TCA AGC TCA GCA TAC ACG ACC TCG CGG CAT CTC 
ISOO 

1.01 CGC TOA CCC TCA AGG CCG TCT TGG GAA TCT CGC TGA TCC TGG CGG TCA TCA TCA TCG CCT 
Lfi«0 

l.,l ACA CCC TCT ACG TAT ACA GGG CCT TCO GCG GAA AGO TCA CCG «=G CGG ACG GCT ACT ACT 

1920 

1,„ GAG TTC CCC TTT CCT TTT TCG ATA TTC GAA CTT TTT TAG GGA AAA GTT TAT AAT TCG AGT 

i»ao 

.1 CAC CTA AGT TCC TTC TGG AAA CCT «VA AAA CGG TGG TCG AAA TGC ACA GAG GCA GAT CTA 

1 CCS CCT GGC CCT ACG ACC GGA AGC CGG TCC TCG TCT TCT GGG AAA CC.% CCA AAG CCT GCC 

,101 GGC TCA AGT GCA AGC ACT GCA GAG CGG AGG CAA TAC TCC AGG CAC TGC CGG GCG AGC TOA 
0 

1 ACA COG AOG AGG GAA AGG CCC TCA TCG ATI CCC TCA CCG ACT TCG GAA GGC CCT ACC CGA 

2221 TAC TCA TTC TCA CCG GTG GCG ACC CGC TCA TGA GGA AGG ACA TCT TCG AGC TCA TCG AGT 

0 

».l ACG CCG TTG AGA AOG GCA TTC GCG TTG GTC TCG CCC CCG CTG TAA CGC CCC TCC TGA CCG 

10 
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23-11 ACG AAA CAA TCG AGA GAA TCG CCA GGA GCC GAG TTA AGO CGC TAA GCA TAA CCC TCG ACA 
2400 

2 401 CCC CGT TTC CAG AAG TTC ACG ACG CAA TCA GAG GCA TAG AAtJ GGA CGT GGC AGA AAA CCG 
2460 

24 61 TCT CGG CCA TCA AGG ACT TCC TGA AAC ACG CCC TAA GCC TTC AGG TGA ACA CGG TTC TGA 
2S20 

2S21 TGC CCG AGA CCG TT3 AAG GAG TGC CCC AGA TCG TGA AAC TCC TTA AAG ACC TCG GCG TCC 
2SS0 

25 91 AAA TCT GGG ACG TCT TCT ACC TCC TCC CGA CCC GCA GGC CCA ACT TCG AGA GCC ACC TGA 
2fi40 

2S41 CGC CGG AGG ACT GGG AGG ACC TCA CAC ACT TCC TCT ACG AGG CCT CGA AGC ACC TCC TCG 
2700 

2 701 TGA GGA CCA CCG AGG GCC CGA TCT TCA GGC GAG TGG CGA TAA TGA GGA AAG CCC TTG AGG 
2760 

2761 AGA AGG GAT TCC ACC CCG ACG AGG TTC TCA AGC CCG CGG AGC TCT ACT TCC GGC TGA AGA 
2920 

2821 AAC GGC TCG TTC AGC TTC TCG GCG AGG CGA ACG ACG CGA GGG CCC AAA CTA TGG GAA CGC 
29SO 

2881 GCG ACG GGA AGC CAA TAG TCT TCA TCG CCT ACA ACG CCA ACG TCT ACC CGA GCG CTT TCC 
2940 

2941 TGC CCT TCA GCG TCG GCA ACG TCC GCG AGA AAA GTT TCG TTG AGA TTT ACA CGG AGA GTG 
3000 

3001 AAC TTA TGA AAA AGC TCC GCT CCG CCG AGT TCG AGG GCC GCT GCC GGA GCT GCG ACT TCA 
3060 

30fil GGG AAA TCT GCG GGG CAA GCA GGG CGA GGG CCT ACC CCT ATC CCT TAA ACC CGC TCC CCC 
3120 

3121 AAG ACC CTG CCT GCC CGT ACG AGC CCG GCT CAT ACC TAA GGC TCG CCA AAA AGT TCA ATC 
3180 

3181 TTC ACC TTC CGA TTG ACA TTT TTG GAG CCC AAA ACC CGA TTT GAG GTG ATG GAA ATG AGC 
3240 

3241 TGG AAG GCT CTT TTA CTQ ATT OCA ATC CTC CTC GTG TCT GTC CTC CGT GCC GGA TGC CTT 
3300 

3301 GGC TCC AAT ACC TCA ACT GAA ACC GGC CCA TCC CAG AAG GAA ATA ACC GTG AAO GAC TTC 
3360 

3361 TCG GGA AGG AAC ATC ACG GCT AAA GTT CCG GTT CAC CGG GCG GTC GTT CTC TCG ACT TCC 
3420 

3421 GCC CTC GAA ATA ATC CAG CTC CTC AAC GCG AGC CAC CAG GTC GTC GOT ATT CCA AAA CAC 
3460 

3481 GCC CAG TAC GAC GCT TTA CTG AGC GAA ACC CTC AAG AAC AAG ACC CTC GTT GGC GCG AGC 
3540 
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3660 



3661 
3720 



3TB0 



3840 



PCT/US97/10784 

WO 97/4»416 

3600 

. CAC CTC MO ^ TTC T.C «C 0^ C.C =A= CTC CTC MC CCC TCC =CC ACC T.C C=. 

CCC =TC =tC CO CTO C«= CTT CA= O.C AT* CCC .AC CCC CTT TCC CTC CTC 

0 

. n.r- rcr TCA AAG ATT CCA GCG GAG GAG AGA AAG AAG CCG ATA 

3781 GTG AAG GAG GTT AAG GCC ATA GCC TCA AAU 

<0 

.TO CAO CCC ATA ATO 0=C AAO CTC TAC CTC CTC AAC GCC AAC CAC CTC CTT OCT CAO 

3900 

CCC CTC ACO CTC CTT =00 OCO OAC TAC CTC CTO AAC CTO ACC TTC AAC OOC TAC ACT CCO 

3960 
0 
0 

CTC AOC OAC OAO OCC TOO AOO OOC ATT AAC OCC OTC AOO OAO OCC AAC OTA OTA ATC CTC 

0 

.U. AOO OCO OAC ATO OOT AAA OAC TCC TTC CTC COC TOO AOC CCO CCC TTO OCA OT. OOA ATC 

0 

;o 

<;ACrrTC,«AAOAaOTTTTACOOCCTCTCCTOATTTTTCTTrTOOOOTOOOACOATOATA 

4320 

0=0 OTC TTT CCA OCO ACT CTC OCO OAA ATC OTC AAA CTC OTC 000 AAA OCC 000 OAO ATA 

4380 

„.V OCC OOA CTO AAC OAO OAA ATC AOO TTC OAC CCC TOC CTO CCO OAO CTO AAO OAT AAO OCT 
4440 

OTC ATC OOA AAO TAC CTC AAO COO AOC AAO AOO ACC TAC TOO OAC OTT TTA OAC OAO CTT 

0 

AOO CCO OAC CTT ATC CTC OAC TTC OAT OTT OAO AAC CTO CAC TCC 000 OAC OAO CTO AOO 

(0 

,SCX OCC TTT 000 OAO COT ATA 000 OCA AOO CTC OAO CTO ATT OAC TTC OAO ACC OTT OAA OCC 
4620 

. j.rc AGG AGG ATA GCC GAG CTA ACG AOG GGC GAC TTT TCA AM CTC GGC 

4621 TTC GTC GAG CCG AGC AGG ACQ AiA v**-*- 
0 

.... 000 TTC TAT OAO AAO CAC CTO ACO AOO CTO OCT OAO ATA ACT OAA OCC ATC OAO OAO AOO 



4020 



4oao 



4140 



4200 



4260 



4$00 



4560 



4680 



4740 
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.... COT ^ OCC CT. CTC .CC T.C CCC T.C OTC CT. .CC ACC .CC ^AC CTT CTC .CC 

4B00 

. rrr rCG ATG AAC CTC GGC GAG AGG ATA CGG ACA AAG CC= AAG 

4 801 GAC GCG GTT AGA AAA GCA GGG GCG ATG AAL i-i^ 

4360 

„ -rr TTr GGC GAT GCG GAC CAC CTC T7C 

4861 GTC TAT CCG GTA AAG AAC GAG CGC TTC TTC AGG .CC TTC GG. GAT 

20 

„n CTC CTC ACC AGC .T. ATC AC. CAC AG. 0A= AAA ATG GAG GGG ATA AGG GAT CAA ATC CTT 



4920 



4980 



/-n* rrr n'-T GAG CTC GGA AAC GTC CAC ATA GTT GGC 
4 981 GAC TCG GCC GAG TGG AGG CCA ATG GAA GCC GTT CAG CTt u 

0 

sc. TCC OCC CTC GAC CTT GAG AOC ^C ATG CGC TCG ACT CCC CGC ATA ATC CCG GGA ATC TAG 

10 

,,0. CAG CTT GGA AGG TTT ATA CAC GGA ACA AAT CAC CCA CCA ATC TCG TGG AAA TCA CTG CAA 

;o 

AAG TTT AAA ATC CCC CTC CCA CCC CTC GAA GAA CAA AAA CGC ATC GTC GCC TAC CTC GAC 

to 

- — „ ^^r. f~rr TAC GAG GAG CGG GAG AAG GAG CTT 

S;2L TCG ATA CAC GAG CGC GCC CAA AAG CTG GTA AAG CTC TAC GAG GAG 

BO 

CAG AAG CTT TTC CCC GCG GTG CTT GAT AGO GCG TTT AGO GOT GAG CTG TGA TTC CGG GAA 

40 

^ ^ ^rA CGG CAA TCT TTG AGA TAG TCA GCG OCT TTG TTC TCT CCC TCG 
1 TGG AAT ACG GCT TTG AGA CGG CAA TLr iii* m*« 

, TAG TCA OOG CTT TCG CTT ACA GTT TTC GTC TTC CAT OGO TAT CCT TTT TGT TCA ACG TTC 

s„. rrr cga tac ttc tga caa tag gcc tga tto aca aaa tgc cct tct got cca tgt cat atc 



534 
S400 



S40 
S460 
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0C1/4V nsphi) 

SEQ ID NO:39 

1 AGC TTG GAT ATC GAA TTC CTT ATA TGA AAA ATT CAT CGA ATT GGT AAA AAA CCA CGA TCT 
61 TCA TCT GGA AAC TCG AAT ATT TGC TCC CCA TAT CCT TCT GGA AAT ACA TAA CGA TGC TCC 

120 

121 GGT GAC TTT GTT ACT TGA TTC AAG AAA AGG TAT TTT GAA GTC ATC TTT CCT GTC TCT AGC 

180 

181 AGG ACT ATA TGC CTG AAT ACT CGC ATA GCA ATA AAA ACA ACT TTT TTG CCC AAA ACG ATG 

240 

241 TGA AG A ATT GTC ATC TAG TGC ATG TAT GTT GTG CAC CCG ATT TGG CAA TTT CTT ATT TGT 

30O 

301 CCG CTG CAC GTG GTG ATA TTT TCT TTT ACA ATC CTA ACA TAC ATC CAA AAG CTG AAT ACG 

360 

361 ACA AAC GAC ACG CCG AAG TGA TTA AAA TTG CTG CAC TCT TTA AAA TCA ATC TTC TGA AAC 

420 

•121 TTC . CTT ATA ATC CTG ACC TGT TCT TCA ACC TTA CTA AAG GAT TAA AAA ATG AAC CTG AAC 

480 

4 81 GCG GGA CAA GGT GCG AGA TTT GTA TAA GAA TGC GAC TAC AAA AAA CAA TCG AAT ACG CCA 

S40 

S41 AAG AAA ATC GOT ACA AGA GTG TTT CCA CAA CGC TAA CAG CCT CTC CAA AGA AAA ATG TAG 

eoo 

601 CGA TGA TTC TGA AGA TAG GAA AAG AAC TGG AAA AAA AAT ACG GTC TGG AAT TTT TGC CTA 
€61 ATG TGT ACC GCA AAA GTC CGC TTT ACA ACG ATC CGC AAA AGC TTA TAA CGA AAA TGG GTT 

720 

121 ATT TAC AGA CAA AAC TAC TGT GGT TGT ATT TTC TCA ATA AGA ACT TCC CTT ATA GTA CCC 

780 

781 ACT CAA GAA ACT AAA ACC GTA AAA ACT CCC CTC GAA GTA TGA AAA TAT ACC ACA AAT TAG 

B40 

841 AAG AAG TTG AAG AAC ATA AGC GGT CGT ATG CAT CAA TTG CTT TTT CAT CGA AAG TCA GCG 

900 

901 TTG AAT ATG AAC ATG CTG GCG AAA AAC TTG CCC TCA TCC CTG TAA CTA TTG GAG ACC TTA 

960 

9 61 CGG TGG TTA TCG AAA TTC ACG ATC ATA CAG AAG TAT TCA ATA CTT TGT TGA ACG AGC ACA 

1020 

1031 TCA AAA ACT CTA TCC TGA AAC AGT TTC CGT ATC CGG AAG AGA TTA GAG GGT TAG CCA GAC 
1080 

10 81 ATT TTC GCA CAG AAT TGA AGA ATT TCA GAA TCT TGG TTG TAA AAT ACA ATA GTC TCG AAG 

mo 
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^ r.. rCT ATT CAC TGT CTA ATA TAA CAT TCG GTG TGG TCT CAT ACA ATA 
114 L AAA ACG AAT TCT CAA GGT ATT CAL 

1200 

no: ™ Tcc .rr ™t t.c c« ex. .ro t« tc. c« c« c.t »ct =tc 

1260 

us. XT. CC .TC TCC »C CTO «<= ^ C=C ^ C.T TCT T.C CCC OCT 

0 

»i.r TCC CCA AAT TAG CGC TTC AAA CCA CTG ACA TTG 

1321 GGT TTG GTG GTG GAA CCT ACG ACC AAC TGC CCA AAT TAG 

0 

... .cc r^o^^oo..,. xc. .c. Tco Txc TOT c^ .T. r™ .OA 

0 

„ m etc OT* TAA T« ACC TAA AC= AAT TTA CAA CCC A=A CAT ATT TT= ACC CAT 

. TT= CTA COC TTC AAA TCA TAT CAC rTC CCA TAA TAC TCC CCA ACT CAC ACC CAC CAC CTA 

, ACT TTC AAC CAC ACA CTT ACO ATA TCA TTT ACA CCA CTT ACT CAA AAT ATA AAA TTA OCA 
CCA CTT CTT ATA CAT ACT TTT CTA ACA TTC CCA CCA CTC TTC CAT AAA CAT ATA TTC CAT 

10 

.0. CCC CAT TTT CCT CCA AAA CTC CAC CCA ACC AAA AX= AAA CCA CCA CAT TTA CAT CAT ATC 
1860 

...X ACA CCA CCA CCA CTC CCA CTT TAX ATA ATT CAA CCC ATC TTC CAC ATT ATC CAA TTC CCA 

1920 

0 

„.X CAC CAC CTT CCC AAT TTA AAT AAT CAA CCC TAT CCA CAA TAT TCA AAA AAC COT CAA AAC,, 
(0 

CCC CCA AAC ACA AAT CCC CTT CCT CAC AOCXTT CAC AAA AAA TCC ATC TTT CAT CTC 

2100 

^ ^ rrr ACC ACA ATT TTT GAA ACT GGT GAT ATA CAA AGA TTT CCG 
2101 TGC ATT TTT TGG CTT TTT GCC AGC ACA ATTT Trr 

;o 

TTC CCC TCT TCC TTA CAC TAC ACC TTT TCA TTC ACC CCT TAC ACC CAT CCC TCC AAT TTT 
„.X CCT TCC TAT TAC AAC ACC TCC TCT TCC TXT ACO A«= AAA TOT CCA CCC CAA TAA ACC 



ISO 
1560 



156 
1620 



16S0 



16 
1740 



ISOO 



1980 



2040 



2160 



2320 



2221 
22B0 



2340 
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2341 ACA TAC ACT AAT GGA AGC TAT CTA TOG CTT CTT ACT TCG GTC GAT ACT CCC TCT CCT TAC 
2400 

24 01 GGT TAA GTT CTA TCG ATA ATT TTC AAT GAG TTG TAC TGA AAT AGC CCA AGT CTT TTT TCG 
2460 

2 520*^^ '"■'^ CAt CAT AAT CCC AGG AGG GTA ATT TAC AAT CTT TTT TAC ATT ACC ATT TAA ACT 

2Sa"^^ ^ ^ ^^'^ '^'^ '^'^^ '"^ "^^^ 

264"^^ TGA TGA GCA GAT AAA AAC ACC AAA TTC GTT TAG AAG TGC GGT GAT TAA GAA AAG AGC 

2700^'*^ "^^^ GAA TCT AAA CAC CGC CCC AGA GTT TGT AGA TGA CCT ATG GAA TGC CAT ATA CAC 

2760'°^ """AT A<JG <^AC AAA ATA CAA CGT TCC CCC AAC GCT TAT AGC CGC TGT CAT TTC TCT AGA AAC 

2820^*^ ^ '^^^ ^ ^^^^ "^^^ *^^° ^^'^ '^'^ "^^"^ ^^''"^ 

29 80^^^ AGC CAA AAA TAT ATC GAA ACT CCT CGC CCT CGA ACA ACC AAA AAA CGC TTC GGA TGA 

294"^^ CCT CAC AAA TTA TTG GTT GAA TAT AAC TTA CCC TAC CGC ATA CAT CGC TTA TCT TTA 

2941 CAA AAA GCA TGG AAC TTT ACA GAA ACC GCT CGA AGA ATA CAA CAA CGG AAA AAA TAA AAC 
3000 

3001 TAA ATA CGC CCA GCT GAT ACT ACA ACA ATA CAA CCT ATA CCA GAG CCT CCA TTC TGC TGA 
3QS0 

312"*^ A*C *AA TAA CCA GCA ATT GGA TAC AGA TAA TTC TTC CAC ATC TTC TGA ACC AAC AGA 

3121 TAC TTT GAA TAC AAC CAG TCC AAC AAA TTC ACA ACC AAC ATC AGA TGC ATC AAA TAC ATC 
3180 

31B1 AGT TAA CAC TTC AGA AAT CAA GTT CCC GCC TCT TTT CGG AGT TGC AGG TTA TTA AGA TAT 
3240 

3241 TTG TTC GGT AGT TAC TTA GCA ATC TGC GGT CTA TAG TTT GGA AGA TGA AAA AAT CAA ACC 
3300 

3301 TGA AAC GAT ACT AAA AAT TGA ACA TTT ATC TTT TTC TTA CCC GAG TTT CAC TCT CAA AGA 
33S0 

3361 TGT AAG TTT TGA GCT TCG GAA GCG AAG TTT CTT CGG CAT TAT TGG ACC AAA TGG TTC GOG 
3420 

3421 AAA AAC CAC GCT ACT CTC ACT CAT TAT GAA ATT CCA AAA CCC AAA AAG TGG GAA AAT AAC 
3480 

3401 AGT TGA TGG GAA CGA TGT GCT CAG GCT ATC TCA CAA AAA ACT TGC ACA ACT TAT AGC ATA 
3540 
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]€00 



1601 



3661 
3720 



3721 
J730 



3781 
3840 



CAt'cGC TCh AGA CTT TAA CCC 
AGG AAT CCC CCG CTC ACC ACA 
TGC ACT CAA AAC TGT TGA TTT 
AGG ACA ACA GCG CAG GGT CTT 
TGC TGA TGA ATT GGT TAA TCA 
AAA ACA ACT TAC CGA ATG TGG 



TAC ATA CGA TTT CAC ACT 
TTT TTT CGA AAC ACC TGT 
GCT TGA ATA CCC AAA AAG 
GAT TGC ACG CGC AAT CTA 
CTT GGA TTT AGG CCA AGC 
AAA GAC CAT AAT TGG ACA 



PCTAUS97/10784 

TCA AGA ATT GGT CGA AAT CGG 
TTA CCA GGA AGA ATT AGA AAA 
AAT ATT CTC CAC TCT TAG TGC 
TCA AAA CAC ACC TAT CAT CAT 
AAT TAA AGT CTT AGA TTA TCT 
TTC CAC CTG CAG CCC GG 3 8 96 
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Archaeogiobus lithocrophicua TF2 (Sijhi) 
SEO ID NO;40 i^pn-i 

I ATG TCC TGC AAG GCG ATT AAC TTG GTA ACG CCA GGT TTT CCC AGT CAC GAC CTT GTA AAA 

120 ^^'^ '^'^ ^"^^ c:-" 

IBO ^^'^ '^^^ "^^^ "^^^ AAG AGA 

24 0 ™ ''"^^^ '^^^ '^^^ TGC 

300 TAT GGC AGA GCG TGA AAG AGC ACA TGA GAA CGA GTC TCA AGA AAT GGG CAA GGG CGT TGG 

360 ^ ^'^'^ ''^^ '^^^ GAA CGA AGA ATA 

4 20 CAA GGA AGC AAA CGA GAG ATA CAA GAA GGC TAG AGA AGA GTT TGA AAG AGC AAA GAA GAT 

4 80 GGG ATT GGA CAT CAG AGA GGA GCG CGG ATT CAA GAT GGC CAA GGG ATT CAT CGT AGC TGC 

540 ACT AGA CGT TCC TCA GAT GTG GCT CGA GAG ACT GAA GGT ACA CGT CAT GAa' TAT CGG TGA 

600 ^^'^ °" "^^^ ^ G*"^ G*A CCT 

5«0 ^ ^ ^'^^ A^'^ TGA AGA GCT CAT 

720 ^'^'^ CAA GAA AAT CAG AAA GGA GTG CAG AGA AAT CAG AGA TGA AAT GAC GGC TCT 

780 "^^^ ^^"^ ^^"^ ^'^^ AAA CCT TGT TGA AAA GGC CAA GCA GGT AGA 

840 GCT AAT GCT TGA GGC AAA GAT CGA GGA GCT CGA TCC TCC AGC AGT TGA TAC AAC CAA ACT 

900 ^ TAA TGA ACC AGA ACA TTT GAT TGA CAA 

960 *** '^'^ ""^ "^^^T C** GGA AGC 

1020*^^ CAA GGA TGT CAA GCA AGT TGT CAG CGA GAT CAA GGA 

108^^^ '^'^ ™ CGG CAA GAT CTT CTA CGG AAA CGA GAC TGG ACA AGT 

in"" ^™ ™ '^^ ™ CGG TAC CGG TAT CGT TGT GAT CAG AGC 
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1200 



120 
1260 



1261 AGA 
1320 



13B0 



^ CCr C.C CCT C« COT =C* AOO T» C.;. CTT C« CCT C.T === CT* C== 

T.C C*. .CT CCA TCC TO. COO TO. AT. CO COT «A =M 0«= CCC AC. 00. .0. 0.T 

0 

rrA GTT TGA GCA TTC CAT TTC CAA GAT TTT TGC 
UBl GAG ATT TAA ACT CTT TCT TCA ACT CTA GCA GTT TGA GCA 

0 

Arr TCG AGA CAG GCT CAA ATG TTG TCC CAG CAT 
1441 TGT TAG CTT CGG GAC AAC TTT GAA AAT ACG TCG ACA CAG G 

0 

T^n fTT TCC GCT CCC CAG CCC AAC ATG CCT TCT 
UOl TGC AGC TTT CGG CAA AGC GAA CGA GAT TTG CGT TCC GCT 

1560 

CT. «= TO. X«: TTC «C TTC «=C TTT CCC «. «C .TC C CTT TTC CCC 

. „C .CT TCT TCT TTC C« .TT T.T TOO .TT TCC TTT C.C CO. ..T OCT^.TC CO. rrC 

... TCT TCC CC «C CTC 0.T .TO COO CTC TTC C«= .OC ..T .CC C.C TCC .CC OTC «T CCT 

L TCC «C CTO OCC OTT C« .TC ..T 0«= COT O.T .TO ..T TCT CO. OOO «=T TTT CTT 

)0 

^» .Kr CCT AAT TAC TTA AGA ACT TTT GOT TTT GCG AAA AAG 
X801 AAC ATA CAT CTA TAC AAT TTA AAC GGT AAT TAC 

;o 

„ ..OC.O.T0.T..CC0OT0«T.TCCTT.TT.TCTCTCC«>CCT0..CC0.OC.TOTC«= . 

„. TO. T.T COO .TC OTO .TC O.T OTO «0 .T. ^C CC CTC C 0« «=. C« .CC 
.TC ^ C«= C «T ^ ..T T.T .TC OOC «T ... TOO «=C TCC CO CTC TTC .OC «C 

0 

^ *Tr arc AAC AAC AAC ATC AAT ACC TTT CAA CTG TCT 
2161 TTT TCC AGC CAT TGA AAT CTG CTT ATG AGC AAC AAC AAC 

;o 

CCrO«TTCTTT.T..TC.T0COO0..O0O.T..O.O.TT.T.CCO..TC.^.CTCT 
^ .TO CC «C .TC .T. .TC OTT TOC CTC «0 TOO CTT T.T OCT OOC .TC .«= CCT CC 



L440 



ISOO 



1620 



162 
1680 



1740 



1920 



192 
1980 



2040 



2041 TCC 
2100 



2160 



2220 



2280 



22S1 
2340 
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2341 ATC CAT TGG TGT AAC TAC ATC TCC AAT ATA CCC AAT GCA ACC AAC ACC ACT TC7 CCA GAG 
2400 

2401 CAA TTC CAT GAG CAT TCT OCT TCC GAT GAG AGC GAC ACT AAA GtT CCT GAG ATA ATC TAT 
24£0 

24 SI CTT TTC TTC ATC TGC CAT CCC ATA CCA GGA AAT TTT TCT CAT GCC AAT AGC CCC CCA TCC 
2S20 

2S21 ATT AAA TCC TAT TAA TTT TTT GCC GTA TTT TGA GCA GCT AGA TAT TAA CCA ATT ATT TTC 
2580 

2581 AAA CCA TTT AAC GCC ATC GAT CAA ACA TCC CAA AAC CAG TTC AGC AAA AAA TTA AAT CAC 
2640 

2S41 TGC CAC ACA TTC AGC ACC CCA AAA TGG TGT GAG AAA TGG ACG AAC TGG GAG GAG TTA TTT 
2700 

27fi0^°^ TTG ATC TGA TAG AAG AGG AGC CCG AAG TTC AGG AGC ACG ACG AGA TTA ACC TCC CAG AGA 

27 SI TAT ACA GCC TTC CTA CAA AAC TTA TAA AGT TAC TCG AAG ATC TCA AAA GCC ATC AGC TTA 
2820 

28 21 AAG ACT CAC CAT CTC TTA TGC TCA TAA AGC AAA TTA TCG GTG AAG ACA GAG TTC TGG TTG 
2860 

2881 GTT TAG CAT CAA AAA TGC TCC AGG ATA TGA GTC TCG CCT TCG AAG AGG ACG AAA AGT ACG 
2940 

2941 TTT CTT GAT TTT TGA ACT GTA TTT TCT ACA TCC TCT TTT CCC AAC CAC ATT CAG TTC CAT 
3000 

3001 CCC ATA CGA AAA TTC CAA TGC CCA AAT CCT GGT AAA TGT ACT TTT TCA TAG TAA ATG CTG 
30fiO 

3061 CCA AAC CCA GAT TAA ACT CAA TTT CAT CAA CAG GAA AAA GAA AGA ACC AAA AAA ACA CCT 
3120 

3121 ACA ACA GTC CTA TAA TTG ACC AAA CTT GAT AGA TTA CAA ACA CCA CAG TTG CAA TCA AAG 
3180 

3181 CAC AGA TGA AAG CTT TCC GGA TTC CTG CAG CC 3212 
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Mellnnococcui ihcrnmlilhoauiolrophicv" SNI (l-tpMI 



Nucleic icid-SEQ ID N0:4l 
Amino acid-SEO IDN&42_ 



, C» AT. .T. »C .AA CTA AA. AAA .TT CO. TAT A.C ^ C.T 0^ 0^ ^ 

" . „e. Clu il. lie Asn Uy. P.. L.u ty, L.ys .1= O^V ty^ «P "V =1. =W Ly. 

" » AA= CAC AAA TCT AAA ACC AAA ATA AAA ATT =AA CAA =AA AAA ACC ATC =AT ATC OAA ATT 

M .y. ASP .ys S« Ly, Thr Ly, lU Ly. n. =lu =lu GIu Vy. Thr Asp 11. GU II. 

" CCA AAA ATT CAA CCT ACT GAA AAT TTT AAT C=T OAT CAA ATT CTT TTT GAO GAA GAT AAT 

" X.. GCC TAC OGT ATA TCC CAC AAA G=A AAT AGA ACA AAC AAC GAA GAC AAT ATT TTA ATT AOA 

.1 Al. Tyr =:y n. S.. Hi. Ly, Oly Asn Arg T»r Asn Asn Glu Asp As. U. L.u U. Ar, 

60 

»X AAA ATA AAA OAT ACC TAC ATA TTA GCA GTT GCA OAT OGT GTC OOA GOO CAC AOC TCA OGA 

"° .y. II. Ly. ASP T.r Tyr 11. L.u Ala val Al. Asp Oly «.l Oly Oly Hi. s.r s.c Gly 

301 OAT GTT OCA TCA AAG ATO OCA GTO GAT ATT ^A GAA AAC ATT ATC ATG GAA AAA TAC AAT 

»1 OAA AAC CTA TCA ATT GAA OAO ATA AAA OAA CTT TTA AAA GAT GCA TAC ATT ACO GCA CAC 
Olu A.a L.U S=r tl. Glu GU U. Ly. olu L.U ..u tys Asp Al. Tyr II. TKr Al. His 
"°,.l AAC AAA ATA AAA GAA AAC OCT ATT OGA GAT AAA OAO OGA ATO OOA ACA ACA CTA ACA ACT 

"°„1 GCA ATA OTT OGO OAT AAA TGC GTT ATA GCA AAC TOC GOG GAT AOT AGO OCT TAT TTA 

"°I« Al. n. V.1 Lys Gly ASP tys cy. v.l 11. Al. A.n cy. oly A.p «r Ar, Al. Tyr »u 

MI ATT «=A OAT OGA OAA ATA OTT TTT AOA ACA AAA OAC CAC TCT TTG OTT CAG OTT TTA OTA 



100 

360 
120 



180 



tiOO 
200 
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^€01 CAT GAA GGA CAT ATT TCA GAG GAG CAC CCA AGG CAT CAT CCA ATG AAA AAT ATC ATT ACC 
201 Asp Glu Gly His lie Scr Glu Glu Asp Ala Arg His His Pro Met Lys Asn He He Thr 



fifil TCA CCA TTG GGA TTC GAT GAA TIT AAG GTA CAT GAT TAC GAA TCG GAT TTA ATT CAT GGT 

720 

^^^221 Ser Ala Leu Gly Leu Asp Glu Phe Lys Val Asp Asp Tyr Glu Trp Asp Leu He Asp Gly 

780^^^ GAT GTA TTA TTG ATC AGC TCC GAT GGG CTT CAT GAT TAT CTC AGT AAG GAA GAT ATT TTA 

260^*^ '^P Vai Leu Leu Met Ser Ser Asp Gly Leu His Asp Tyr Val Ser Lys Glu Asp He Leu 

fl^o''^^ ^ ^"^^ ^ 

261 Lys Thr Val Lys Asn Asn Asp His Pro Lys Asp He Val Asp Glu Leu Phe Asn Thr Ala 



841 TTA AAA GAG ACA AGG GAC AAT CTG AGT ATT ATT CGT ATA 879 
281 Leu Lys Glu Thr Arg Asp Asn Val Ser He He Arg He 293 
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Pyrolobus Eumaxrius lA (Iphl) 



l.SEO 10 NO:43 -Nucleic aciU 



jSEQ ID NO:44.amino acid 



„C ACT CTO OT. CCC CTC T« .AO MT »A CCT CTT ATC OTC .AC CTT CC=C TCC CC= ACC 



lHHjM*_lt-.J'-'" 

" . «« T>,r ..u .e. Ala L.. Ty. Oin A.n Lys A., val H. VaX Lys Le. o.y Trp Oly Ur 

CTC CCC AGA CTC CAT ACC CCT CTA CAC TTA CAT CAT CTC AAC CTC CTT CCC ATA CAC TAC 
val .ro Ar, L.u Hi. T«r Ar, Leu A.p U.u Asp Asp Vai Ly. L.u val Ala U. Clu Tyr 
" ATA CCC TAC AAC ACC CTT AAC CCC CTC CCC CCC TTC AAC CCC CTT AAC CCT CTC ACA CCC 

"° 61 lU Pr» Tyr ty. «r L.u Aan Ala val Cly Arg Leu Aan Pro Leu Lys Ala Val Thr Ala 
" »1 CTC rrc TAT ACA CTC CCA TCC CTA CTC CAT ATC CAC CCC CCT CCT TTT CCT CAT T=C CAC 

"°,C1 CTA AAC CCC CCT AAC CTT ATA CC* CTT CCC AAC CCT CCC ATC CTC TTC ATC_CAC TTT CCT 

»1 CTT CCA CCA CCT TTT CAC CCT CCC CCC TTC CCC CCA CCA ACA CCA CCC TAT ACC TCC CCA 

„1 CAC CCT CTC COC CCC CAC ACC CCC CCC TCT CCC TCC CAT CTC TAC ACC CTT CCC CCC ATA 

'..1 TAC TAC TAC TTC CTT ACC COC TTA ACC CCC CCA CCC CAC CCA AAA CAC TTC CCC AAC CCC 
Ul Ty. Ty. Ty. Leu val T.r Cly Leu Ser Pro Pro Ar, Asp Pro Lye Clu P«e Al. Lye Ala 
S,l CTC TCC TTC OCT CCC CCT CCA ACT ACC CTC TTC CAA CTO TTC ACA CAO CTC CTC CTC OAT 

„1 CCC CAC TAT CCT AAC ACC CTT CAT CCT CTC CAC CTC. TTC AAC ATT CTT CCA TCT TTT AAC 
',.1 Pro Clu Tyr Ar, A=„ S.r Leu Aap Pro L.u Cl„ Leu Leu Lys U. V.l Al. Ser P«e A.„ 
„X CCC CAA CTO CTA OTC CCT CAT ATC CTT ATA CAT OCT CTT TAC AAC CCC CTA OCT TAC OCC 
pro cm Leu Leu Val Pro Hi. II. v.l II. Asp Cly Val Tyr Ly. Pro Leu Cly Tyr Cly 



20 

120 
40 

180 



120 

430 
14 0 

480 

160 

S40 

180 

fiOO 
200 

£60 
220 



240 



,n CAC CTA ACC ATA CCC TCT ACA CCC CTT ATA COT CTT OAT OCA CCA CCA CTO TAC CTC OCC 
' „i Olu v.l S.r 11. Oly ser Ar, cly v.l He Ar, v.l A.p Cly Ar, Pre val Tyr L.u Ala 
,.1 CTT AAO ACC CAT OTC AOC COC ACA ACT ATC TAC OCC TAT ACC CAT CTT OTC OTC TTT ACO 



780 
260 

840 

260 
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841 ACA GGC GAG AAA CTC ATA GTG AGA AGC GCT GAG AGT ATA GAC CTA GAG TTT AAC GAC CTG 

900 

281 Arg Gty GJLu Lys Leu lie Val Arg Ser Gly Giu Ser lie Asp Leu Giu Phe Asn Asp Leu 



300 



901 GTG TTG TTC GAC AAC CAC ATA CTA TAC GTA TTT ATC CTT CCC GAA AGG CCC 951 
301 Val Leu Phe Asp Asn His lie Leu Tyr Val Phe lie Leu Pro Giu Arg Pro ill 
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Thermococcus celer (25ph2) 



jSEQ ID NO:45-nucleic acid 



ISEQ ID NO:46-ainino ncid 



, „C CAC «C AOC CCC GTT OTT TTT C AC CTC CAC OOO ACC CTT GTC CCT CCT CAC AAV; ... 

" 1 «ec A»p lU Ar, Al. val V.l Ph. Asp L.u Asp Oiy Thr L.u Val Sly Ala =1. Lys Thr 
„ TTC A=C GAG ATA AAG TCC GAO CTT AAA GAA CCG CTG ATT TCG TTA GCG ATT CCC AGG GAG 
Ph. S.r Glu ti. cys S« Glu L.U Lys Glu Arg L.u U. S.r L.u Gly 11. Pro Ar, Glu 

121 CTC GTT OGA GAG CTA AGO CCG ATO TAT GAG GGC CTT ATC GAG CTG TCC AGA AAA ACG CGC 

,1 L.U val Gly Glu L.U Thr Pro «t Tyr Glu Gly L.u U. Glu L.u S.r Arg Lys Thr Gly 

181 AGA CCT TTC GAA GAG ATG TAC TCA ATT CTC GTC AAT CTT GAA GTT GAA AGG ATA AGG GAC 

51 Arg pro Ph. Glu Glu H.t Tyr S.r II. L.u v.l Asn L.u Glu val Glu Arg tl. Arg Asp 

2,1 AGC TTT CTC TTC GAO GOG GCA AGG GAG CTC CTC GAC TTT CTT GTG GGO GAG GGA ATA AAC 

ei s.r Ph. L.U Pn. Olu Oly Al. Arg Glu L.u L.u A=p Ph. L.u val Gly Glu Gly II. Lys 

101 CTT CCC Crc ATG ACC COO AGC TCC AGA, ATG OCT GCC CTT GAG GCC CTG OAG CTT CAC GCC 

101 L.U Al. L.U «.t Thr Arg s.r s.r Arg «.t Ala Ala L.u Glu Ala L.u Olu L.u His Gly 

1» ATT AAG OAC TAC TTT GAO ATT ATT TCA ACG AGG GAT OAT GTC CCT CCC OAG OAG CTO AAA 

'l21 U. Lys ASP Tyr Ph. olu II. n. s.r Thr Arg Asp Asp val Pro Pro Olu Glu L.u Lys 

Ol CCG AAT CCT GGC CAG CTG AOG AGA ATC CTC GOT OAG CTC AAC OTT CAA CCA OAG AAA GCC 

1,1 Pro Asn pro Gly Gin L.u Ar, Arg U. L.U Gly olu L.u *.n v.l Oln Pro Glu Ly. Ala 

„1 ATC GTC GTT GGA GAC CAC GGC TAC OAT GTC ATC CCT GCC COO GAO CTC OGC OCT CTG AGC 

"°Ul n. val val Gly Asp His Oly Tyr Asp v.l U. Pro Ala Ar, Glu L.U Oly Al. L.u S.r 

S,l GTC CTT GTC ACC OGC CAC GAG OCT GOC AOA ATO AGC TTT CAG OTT GAA GCC GAO CCA AAC 

1,1 val L.U val Thr Gly Hi. Olu Al. Oly Arg «.c S.r Ph. Gin Val Glu Ala Glu Pro As. 

I 

.01 TTT GAG OTC OAG AAC CTC ATI CAC CTC AOO AAG CTC TTC OAG AGO CTC CTO TCO AGC TAC 

'„! Ph. Olu V.1 Olu Asn L.U II. Hi. L«. Ar, Lys L.u Ph. Glu Arg L.u L.u S.r S.r Tyr 

1 

«l GTT OTT GTT CCC OCT TAC AAC GAO 0*0 AAG ACC ATC AAO OGO GTA ATA GAG AAT CTT CTC 

„X val val val Pro Al. Tyr A.„ Glu Olu Ly. Thr II. Ly. Oly Val U. Olu A.n L.U L.U 

,Jl AGO TAT nC AAA AAG OAC OAG ATA ATC GTC OTG AAC OAC OGC TCC AGO GAT AGA ACG GAO 

2,1 Ar, Tyr Ph. Ly. Ly. Asp Olu U. II. Val val A.„ Asp Gly S.r Ar, Asp Ar, Thr Olu 

,.1 GAO ATA OCT CGT TCT TAC OOA GTC CAC OTT CTT ACG CAT CTC OTC AAC AOG GGO CTT GOT 

'„! Glu 11. Al. Arg s.r Tyr Gly v.l His val L.u Thr Hi. L.u Val A.„ Arg Oly L.U Gly 
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900 ^ ^'^^ ^"^ '^''^ ^ ^ 

loo"^ Gly Ala Leu Gly Thr Gly Phe Ala Tyr Ala :ie Arg Lys Asn AU Lys Leu Val Leu Thr 

960*°^ ^ °" '^^'^ '^'^ ^"^'^ GCG 

320"*°^ Phe Asp Ala Asp Gly Gin His Leu He Ser Asp Ala Leu Arg Val Met Arg Pro Val Ala 

1020*^ ^"^^ "^^^ '"^ ^^"^ '^^^ '^^^^ 

^^^321 Glu Gly Arg Ala Asp Phe Ala Val Gly Ser Arg Leu Lya Gly Aap Thr Ser Gin Met Pro 

lOao^* ^ ™ ^'^'^ ^""^ AAA 

^^^341 Leu Val Lys Lys Phe Gly Asn Phe Val Leu Asp Ala Val Thr Ala Val Phe Ala Gly Lya 

1140*^ ^'^'^ "^^^ ''^'^ '^^^ "^^^ ATC 

38o"^ ^'"^ ^" ^^'^ ^^"^ lie 

12o"^ AGG ATA ACC TGC GAC CGC TAT GCC GTC TCG ACT GAG ATT ATA ATA GAG GCC TCC AAA GCC 

40o"^ "^^^ ^ ^""^ ^^"^ 



1260 



1201 GGC TGT AGA ATT GTC CAA GTT CCT ATC AAG GCT GTT TAC ACT GAG TAC T7T ATG AAG 
401 Gly Cys Arg He Val Glu Val Pro He Lys Ala Val Tyr Thr Glu Tyr Phe Met Lys 



AAG 



1320 



12fil GGG ACG AAC GTT TTA GAG CGC GTT AAG ATA GCC! CTG AAC CTT CTC TTT GAC AAA CTG 



AGG 



440*^^ Gly Thr Asn Val Leu Glu Gly Val Lys lie Ala Leu Asn Leu Leu Phe Asp Lys Leu Arg 
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t;Fn ID NO: 4 7 and 4 8 



Aquifex pyrophiius (28phl) 



20 



40 



- . ,ro.^^rcu.».>^ etc ctt e tc cc. »c .TC ..c ctt ccc c=. cc. 

0« ™ «C GAA TTC =GA «C ATT CTT ^ AAC ATT OAA CAA A.= CSA CA= A*= 

=AC TTC CTC ACC T.C CTT GAT AAA ACC TCC GAA GAG AOA ATA AAa' GAG CT* .T* CTT AAG 

.ax TTC TTT CCC GAC CAC GAG GTC GTG GGC GAG GAA AGG GGA AAG GAG GGA AAA GAA AGC CCT 

TAC AAA TGG TTC ATA GAC CCC CTT GAT GGG ACC AAG AAC TAG ATA AAG GGC TTT CCC ATA 

',0X TTT GCA GTC TCC GTC GGA CTC GTT AAG GAA AAC GAA CCT ATA GTG GGA GCG GTT TAC CTT 

3.. CCT TAC TTT GAT ACC CTA TAC TGG GCT TCA AAG GGA AGG GGA GCC TAT AAA AAC GGC GAG 

AGG ATA AGC CTA AAC GAA ACC GGG CAC CTC AAG CAC GCG GCG GTT GTT TAC GCA TTT CCA 
A., U. S.. V.X LV. Glu AT, CIV C.U L.U .y, Ki. Al. Ai. V.l Tyr Gly Pro 

TCA AGA AGC ACC AGG GAT ATA TCT CTT TAC CTC AAT GTC TTT AAA GAC CTC TTT TAC GAA 
S« AT, S« AT, A., AS. U. ser ... Ty. U.. A.n «1 .y, Glu Vai The Tyr Gi. 
S« CTA OCT TCC GTT AGG AGG CCC GGC CCC CCA GCG GTT CAT ATA TCC ATG CTT GCG GAC GGC 

„X ATA TTT GAC CGG ATG ATG GAC TTT GAC ATG AAC CCA TGG GAC ATA ACC GCG OGA CTC CTA 

"°«X ATA CTC AAG GAA GCT GCA GGA TTT TAC ACA CTG AAC GGA CAC CCC TTC GGC ATC TCG CAC 

"%,x U. L.U Ly. CI. AX. Gly Gly Ph. Tyr Thr L.u Lys Gly Asp P« P«. Glv U. Se. Asp 

„X ATA ATA GCG GCA AAC AGG ATC CTC CAC GAC TTC ATT CTC AAC GTT GTC AAT AAA TAC ATG 
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761 AAT AAT GAA AGC ACG 79S 
261 Asn Asn Clu Ser Thr 26 5 
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SCO ID NO: 49 and 50 ^^^^''^^^ thermoleovorans {68fy5) 



60 
20 
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1 ATG ACT GAA CAC CCG GTA TO TCT CTT CAA GGA TTA AGC GGC CCG TAT AGC ATC AAC CGA 
1 Kec ser Glu Gin Pro Val L.u Ser Val Gin Gly Leu Ser Cly Gly Tyr Ser Me. Asn Arg 

61 CCG GTT CTG CAT GAC GTA ACC T7T CAG CTT GAA CCG GGT GAG ATG GTG GGT TTG ATC GGC 
21 Pro Val Leu His Asp Val Thr Phe Gin Val Glu Pro Cly Glu Mec Val Gly Leu He Gly 
^121 CTG AAC GGT GCG GGC AAG ACT ACC ACG ATG AAC CAT ATT CTC GGG CTG ATG AAT CCG CAA 
41 Leu Asn Gly Ala Gly Lys Ser Thr Thr Mec Ly. His He Leu Gly Leu «et Asn Pro Gin 
lai AAA GGG AGC ATT CAG GTT CAA GGA AAG AGC CCG ACA GAC CAT TCC GAA GCC TAT CAC GGC 
SI Lys Gly ser He Gin Val Gin Gly Lys Ser Arg Thr Glu Hi. Ser Glu Ala Tyr His Cly 
241 GCC TTG GCC TTT GTT CCC GAA TCC CCG CTG CTG TAT CAG GAG ATG ACA GTA CGA GAG CAT 
Bl Ala Leu Ala Phe Val Pro Glu Ser Pro Leu Leu Tyr Glu Glu Met Thr Val Arg Glu His 
301 CTG GAA TTT ACG CCC CGC TCC TAT GGC GTA TCC CGT GAA GAT TAT GAG GCA CCT TCG GAG 
101 Leu Glu Phe Thr Ala Arg Ser Tyr Gly Val Ser Arg Glu Asp Tyr Glu Ala" Arg Ser Glu 
3fil CAG CTG TCG AAC ATG TTC CCT ATG CAA GAC AAG ATG GAC AGC CTG TCC ACG CAT TTG TCC 
121 Gin Leu Ser Lys Met Phe Arg Het Glu Glu Lys Het Asp Ser Leu Ser Thr His Leu Ser 
421 AAA GGG ATG CCC CAA AAA GTG ATG ATC ATG TGC CCA TTC GTA GCC AGA CCG TCC CTG TAG 
1*1 Lys Gly Het Arg Gin Lys Val Met Ue Met Cys Ala Phe Val Ala Arg Pro Ser Leu Tyr 

181 ATC ATT CAC GAG CCC TTT CTT GGC CTT GAT CCG CTT GGG ATA CGC TCG CTC CTT GAC TTC 
Ul He lie ASP Glu Pro Phe Leu Gly Leu Asp Pro Leu Gly lie Arg Ser Leu Leu Asp Phe 

?!J P"^ 5^ "° ^ GCT TCG GTA TTG CTA AGC TCC CAC ATT s«i 

181 Met Le« Glu Leu Lys Ala Ser Gly Ala Ser Val Leu Leu J^r ler His tli 197 
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SCO ID N0;51 and 52 



Pyrococcus furiosus VCl Cphi* 



. „C ^ .CT ATT ACT TT. CT. CTT TT. CTT ... U. ;>.r 

CTC CCA TAC =AT TCC CA* =A= AaC ==T ATT AAA AAT ATA ATA ATC CTC ATT ==A CAC OCC 
... G=A AT. A=T CAT CTC CAC ATT ACA A.0 CTT =TT TAT =CT CAT CTA AAC ATC CAA CAC 

rrc ccA att att c=a ttc caa ctt act cag tca tta act c=c oaa =tt acc oa= t=c gct 

' OCA CCA CC. act CCA ATA CCA ACT OCA CTC AAA ACA TAT A*T CC. «C »TT TCA CTT ACT 

'°3 = . *AC ATA ACT CC. AAA CTT ACA AAT CTA ACT ACC TTC CTT CA. ATA CCC CAC CTA CTT CO. 

^ TCA ACT CCA CTT CT= ACT ACT ACT ACA ATT ACA CAC CCA ACC CCT CCa" CT. TTT CCT 
XCC CAC CTT CCT CAC ACA CAT ATC a« CAC CAA ATA CCC ACA ^ CTC AT. CCT CAC CCC 

,„C AAC CTC CTA TTA CCT CCA CCC AC* A« AAA TTT CAC CAC AAT ACC CTA AAA ATC CCA 

v.. «n val ..u L.U CIV Oly Olv AT, .ys L.. P.. A=p Clu A.n THr .ys 

^ CAC CCA TAT AAT ATA CTC TTC ACC A«CA* CAC CTC CAC AAA CCA CAC CCT CAC 

TTT ATT CTA CCC CTT TTT CCA CAT ACC CAC ATT CCT TAC CTA TTC CAC ACA A^ CCA CAA 

CCA CTT TTC C« ATC ACT AAA AAA CCA ATT TCA AT. CTA CAC AAA AAT CCA AAT 
A3P V.1 CIV L.U «u CU M,. Thr LV» LV. Al. H. U. L.u Clu .y. A.n «o A». 

ceo TTC TTT CTC ATC ATT CAA CCC CCC ACA ATT CAT CAT CCA CCT C.T CAC AAT CAT ATA 

CCA TCA CTT CTT C« CAC ACT AAC CAC TTT CAT CAC CTT CTT CCA TAT CTT CTT CAC TAT 
Ala va. Val AU CXu Tn. Ly. Clu Ph. A,p A,p val val Clv Tyr Val U.U Clu TV. 
CCA AAA AAC ACC CCA CAT ACA CTA CTA ATA CTC CTC CCT CAC CAT CAC ACA CCC CCC CTT 
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96o'°' "^^"^ "^"^ '''^^ =TC Arc AGO AAC ATA AAC CCT 

^^^301 Cly Leu Ciy Leu Thr Tyr Gly Asp Ala lU Asn Glu Asp Val -He Arg Ann He Asn Ala 

1020" ^'^^ """^ ^"^^ ATA AAG AGA CTT ATC AAA 

^321 Ser Val Ser Lys He Ala Ser Glu lie Arg Ala Thr . 

loi"' ""^ TAC ATT GAG GAA OCT ATA AAC 

36o"^ Lys Tyr Thr Gly Phe Glu Leu Thr Glu Aap Glu He Asn Tyr lie Glu Glu Ala He Asn 

^^lOai TTA GCA CAC GAA TAT GCG CTT CAA AAT CCA ATA CCT GAT ATT ATA AAC AAA CGC GTT OCT 
^^^3gl Leu Ala Asp Glu Tyr Aia Leu Gin Asa Ala He Ala Asp tie He Asn Lys Arg Val Gly 

125"' C«JT TTT GTA TCC CAC AAA CAT ACA GGA CCT CCT GTT TCA CTT CTA CCC TAC GGC CCA 

400'" """^ ''^^ P'^o val Ser Leu Leu Ala Tyr Gly Pre 

12fio" TTT GCA GGCTTT TTA CAC CAT GTA GAT ACG CCA AAG CTA ATT GCC AAG 

^^^401 Gly Ala Glu Asn Phe Ala Gly Phe Leu His His Val Asp Thr Ala Lys Leu He Ala Lys 

132"" ^^^.''^ "^"^ ""^^ "^^^ °" ATC TTG GGA ATA ACT CGA CTT AAA 

^^^421 Leu Met Leu Phe Gly Lys Lys Asp He Pro Val Thr He Leu Gly He Ser Gly Val Lys 

13a"'' """^ "'"^ ™ "^'^ ^^'^ ""^ <=AT CCA TAT CTG ACC TTA ATC ATG 

460*" ""^^ '''•^ *1* Tyr Val Thr Leu Met Het 

^^Uai TTG CTT CCC GAA ACC GTA GAT ACT GAA CTT GAA ACG AAA CTC CAC ATC AAT AAT AAC GGC 

4B0*'' ""'^ I-V^ Val Asp Met Asn Asa Asn Cly 

^ii} GAG TTG CGA GAC CTC CTC CTG ATT CTA CAA GAG TCC 1482 

481 He Ha Glu Leu Cly Asp Val Leu Leu He Leu Gin Glu Ser 494 
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Pyrococcus furiosus VCl Cphai 

SEQ IS »0:53 and 54 ~ 



, .TC ATT MA ^AC TTC »A .CC TCT C^T CCA CCA AGC >CA CAA CAA .;CC ... ... 

" 1 „ec Ue «n CI. Ue As„ Ph. Ly, TKr S« HU Cly Cly S.r Arg Clu Giu Oly Tyr lU 

.1 AAC rrc TCC CCC TCT CTA AAT CCT tat CCA CCA CAA TCO ACT CAT CAA ATC TTT CAC ACC 

.1 Asn Ph. S.. AU .er Val Asn Pro Tyr Pro Pre Clu Trp Thr A,p CU «.t Ph. Clu Ar, 
CCT AAA AAC ATA ACC ACC TTC TAT CCT TAC TAT CAA AAG CTT CAC CAA CAA CTC TCA CAT 

.X Al. Ly, Ly. U. S.r Thr Ph. Tyr Pr» Tyr Tyr Clu Vy. L.. Clu Clu Clu L.u s.r Asp 

lai CTA ATT CCC CAC CCA ATA ACT ATA ACT CCA CCA ATA ACA CAG CCA CTT TAC CTC CTT CCA 

,1 ...u II. CIV Olu pro II. Thr 11. Thr Ala Cly II. Thr clu Ala C.u Tyr L.u L.u Cly 

241 CTT TCC ATC AOC CCT CCC AAA CTA ATA ATC CCC AAC CAC ACC TAT CCC CAA TAC CAG ACG 

,1 val Trp M.r Ar, Cly Ar, Ly. Val II. U. Pro Ly. Hi. Thr Tyr Cly Clu Tyr Clu Ar, 

301 ATC TCA CCC ATC TTC GGA GCT ACG CTC ATC AAA CCT CCC AAT CAC CCA CCA AAC TTA CCA 

„1 CAA TTT CTT CAA ACA AAT TCA TTC CTC TTC TTC TCC AAT CCA AAC AAT CCA'cAT CCA AAC 

"°X=1 Clu Ph. val Glu Ar, A,n S.r Ph. val Ph. Ph. cy. A3n Pro Ash Aan Pro A.p cly Ly. 

«1 TTC TAC CCA CAA AAA CAG ATC AAA CCT CTT TTA GAT CCC ATT CAA CAC ACT AAC TCA ATT 

1,1 Ph. Tyr AT, Clu Ly. Clu «.t Lys Pro L.u L.u A.p Ala U. Gin A.p Thr Aan S.r II. 

,.X TTC ATC TTC GAT CAA GCC TTC ATA CAC TTT CTT AAC AAA CCA CAA ACC CCA GAG GGA CAG 
..u II. L.a Aap Glu Ala Ph. II. Aap Ph. val Ly, Ly. Pro Clu s.r Pro Clu Cly Clu 

„1 AAC ATA ATC ACC CTA AGO ACT TTT ACC AAA ACC TAC CCC CTC CCA CCC OTA AGC CTT GCA 

1.1 A.n U. II. AT, L.U AT, Thr Ph. Thr Ly, S.r Tyr cly L.u Pro Cly v.l Ar, Val Gly 

„1 TAT CTT ATT GGA TTT CTC CAT GCT TTC AGG AGC CTT ACA ATC CCA TCC TCA ATT CCC TCT 

'201 Tyr V.1 U. Gly Ph. v.l Aap Al. Ph. Arg S.r v.l Ar, Pro Trp s.r II. Gly S.r 

'„! ACT GGO GTG GCC TTC TTA CAC TTC TTA C« AAA CAT AAC TTC AAA CAC TTA ACA AAA ACC 
,21 crC CCC CTA XIA TCG AAA CAA AAC GAG AGG ATT GAG AAA CAA TTO AAA STT AAA AGC GAT 

„1 CCA AAT TTC TTC ATT ATO ^ CTC AGA CAA GCA ATA ATT CAA AAC CTA AAA CAG AAT GCC 

2.1 Ala A.. Ph. Ph. U. K.t Ly. val Ar, GU Gly II. II. Olu Ly. L.u Ly. Glu Asn Cly 

) 

..1 ATC ^ .TTA ^ CAT TCC AAC ACC TTT CGA CTC CCT GCC TAC ATA AGG TTT TCA CTT ACA 

'2»1 II. L.U val AT, A.P cy. Ly. S.r Ph. Gly L.u Pro Gly Tyr II. Ar, Ph. S.r v.l Ar, 
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901 ACG AGA GAA GAG AAT GAC AAA CTC ATA AAC ATC CTT ACA AAA ACA CTT AAT ACT 954 



301 



Arg Arg Glu Glu Asn Asp Lys Leu He Asn lie Leu Arg Lys Thr Leu Asn Thr 3 18 
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What Is r1 aimed Is: 

1. An isolated polynucleotide selected from the group 

consisting of : 

(a) a polynucleotide encoding an enzyme 
comprising an amino acid sequence selected from the group 
of amino acid sequences set forth in SEQ ID NOS:28-36; 

(b) a polynucleotide which is complementary to 

the polynucleotide of (a) ; and 

(c) a polynucleotide comprising at least 15 
bases of the polynucleotide of (a) or (b) . 

2. An isolated polynucleotide selected from the group 

consisting of: ^ .„ 

(a) SEQ ID NOS:19-27, 37-41, 43, 45, 47, 49, 51 

or 53; 

(b) SEQ IDNOS:19-27, 37-41, 43, 45, 47, 49, 51 
or 53, where T can also be U; and 

(c) fragments of a) or blthat are at least 15 
bases in length and that will hybridize to 
DNA which encodes the amino acid sequence o 
any of SEQ ID NOS: 28-36, 42, 44, 46, 48, 50 
52, or 54. 



3 . The polynucleotide of Claim 1 wherein the 
polynucleotide is DNA. 

4. The polynucleotide of Claim 1 wherein the 
polynucleotide is RNA. 
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5. An isolaced polynucieocide comprising a 

polynucleotide having at least 70% identity .to a member 
selected from the group consisting of: 

(a) a polynucleotide encoding an enzyme encoded 
by the DNA contained in ATCC Deposit No. 97379, wherein 
said enzyme is selected from the group consisting of 
Ammonifex degensii KC4 , Aquifex VF-5, MllTL, Methanococcus 
igneus KOL5, Thermococcus AED112RA, and Thermococcus celer, 
Thermococcus Ch-2, and Thermococcus GU5L5 . 

(b) a polynucleotide complementary to the 
polynucleotide of (a) ; and 

(c) a polynucleotide comprising at least 15 
bases of the polynucleotide of (a) and (b) . 

6' A vector comprising the DNA of Claim 1 or Claim 

2. 



7. A host cell comprising the vector of Claim 6. 

8- A process for producing a polypeptide comprising: 

expressing from the host cell of Claim 7 a polypeptide 
encoded by said DNA and isolating the polypeptide, 

5' A process for producing a recombinant cell 

comprising: transforming or transfecting the cell with the 
vector of Claim 6 such that the cell expresses the 
polypeptide encoded by the DNA contained in the vector. 



-102- 



PCT/US97/10784 

WO 97/48416 

. u-^v, ipast a portion is coded 
An enzyme of which at least a p , , . 

acid residues to che enzyme of (a) . 

An enzyme of which at least a portion is coded 
;:r .y a p^^ynucllotide of claim X, and which is selected 
... the .roup -^^X^s^ an a^ino acid sequence 
.eiectea fro. the .roup of -no acid sequence, set^forth^ 
in in SEQ ID NOS:28-36, 42, 44, 45, 48 50 ^ 

(b) an enzyme which comprises at least 
acid residues to the enzyme of (a) . 
,2. A method for hydrolyzing phosphate bonds 

^^^^"""laministerin. an effective amount of an eny.me 

50 , 52 , or 54 . 
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1/n 



FIGURE 1. 



Anmonifcx deffcnsii KC4 Phosphatase (3 AJJU3A2AJ 
Complete gene sequence 



ATGAGGGGGAGCGGAGTGCGGATACTTCTCACCAACGATGACGGCATCTTTGCCGAGGGT 
1 MetArgGlySerGlyValArglleLeuLeuThrAsnAspAspGIyllePheAlaGluGly 

CTGGGGGCTCTGCGCAAGATGCTGGAGCCCGTGGCTACCCTTTACGTGGTGGCTCCGGAC 
21 LeuGlyAlaLeuArgLysMetLeuGluProValAlaThrLeuTVrValValAlaProAsp 

CGAGAGCGTAGCGCGGCCAGCCATGCTATCACCGTTCACCGCCCCCTGCGGGTGCGGGAG 
4 1 ArgGluArgSerAlaAlaSerHisAlalleThrValHisArgProLeuArgValArgGlu 

GCGGGTTTTCGCAGCCCCAGGCTTAAAGGCTGGGTAGTGGACGGTACCCCGGCCGACTGC 
61 AlaGlyPheArgSerProArgLeuLysGlyTrpValValAspGlyThrProAlaAspCys 

GTCAAGCTGGGCCTGGAGGTACTTTTGCCCGAACGTCCAGATTTCCTGGTTTCGGGCATA 
81 ValLysLeuGlyLeuGluValLeuLeuProGluArgProAspPheLeuValSerGlylle 

AACTACGGGCCCAACCTGGGTACCGACGTACTTTACTCCGGCACCGTCTCGGCGGCCATA 
101 AsnTyrGlyProAsnLeuGlyThrAspValLeuTyrSerGlyThrValSerAlaAlalle 

gaaggggtaattaacggcattccctcggtggccgtatctttggccacgcckx:gggag^ 

121 GluGlyVallleAsnGlylleProSerValAlaValSerLeuAlaThrArgArgGluPro 

gactatacctgggcggcccggttcgtcctggtcctgctggaggaactgcgaaaacaccaa 

141 AspTyrThrTrpAl aAl aAr g PheVa 1 LeuVal LeuLeuGluGl uLeuAr gLysHi sGln 

ctgcccccaggaaccctgctcaacgtcaacgtgcccgacggggtgccccgcggggtcaag 

161 LeuProProGlyThrLeuLeuAsnValAsnValProAspGlyValProArgGlyValLys 

gtgaccaaactgggaagcgtacgctacgtcaacgtggtagactgccgcaccgaccctcgg 

181 ValThrLysLeuGlySerValArgTyrValAsnValValAspCysArgThrAspProArg 

gggaaggcttactactggatggcgggagaaccattggagctggacggcaacgactccgaa 

201 GlyLysAlaTyrTyrTrpMetAlaGlyGluProLeuGluLeuAspGlyAsnAspSerGlu 

accgacgtctgggcggtgcgagaaggctatatttccgtaacaccggtccagatcgacctt 

221 ThrAspValTrpAlaValArgGluGlyTyrlleSerValThrProValGlnlleAspLeu 

actaactacggcttcctggaagaactcaaaaaatggcgtttcaaggatatcttttcttct 

241 ThrAsnTyrGlyPheLeuGluGluLeuLysLysTrpArgPheLysAspIlePheSerSer 
TAA 

261 End 261 
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FIGURE 2 

uezhanococcus igneus KolS Phosphatase (9A1A) 
Complete Gene Sequence 

ATGTTGGATATACTGCTTGTTAATGATGATCGCATTTATTCAAATGGATTAATAGCTTTG 
1 ^IrHSIeLuLeuValAsaAspAspGlylleTyrSerAsnGlyLeuIleAlaLeu 

AAGGATGCATTATTGGAAAAATTTAATGCGAGGATTACTATTGTAGCCCCAACAAATCAG 
21 ^^s^^a"uLeuGluLysPheAsnAlaAr.IleThrIleV.lAlaProTh.AsnGln 

CAGAGTGGTAITCGTAGGGCAATAAGTTTATTCGAGCCGTTAAGGATAACT^^CCA^ 
,r ClnSel^JylleGlyArgAlalleSerLeuPheGluProUeuArglleThrLysThrLys 

AGAATATTAAATGAATGA 
261 ArglleLeuAsnGluEnd 266 
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HGURE 3 



Thermococcus alcaXiphiXus ASDII12RA Phosphatase (IBA) 
Complete Gene Sequence 



1 W^tMetMetGluPheThrArgGluGlylleLysAlaAlaV^lGluAlaLeuGlnGlyLeu 

21 GiyGiuIleTyrValValAlaProMetPheGlnArgSerAlaSerGlyArgAlaMetThr 

41 "«HisArgProLeuArgAlaLysArgIleSerMetAsnGlyAlaLysAlaAlaTyrAla 

'I'^TGGAATGCCCGTTGATTGCGTTATCTTT^ 
61 i-euAspGlyMetProValAspCysValllePheAlaMetAlaArgPheGlyAspPheAsp 

CTTGCAATAAGTGGTGTAAACTTGGGAGAAAACATC^^ 
81 Le^alleSerGlyValAsnLeuGlyGluAsnMetSerThrGluIleThrValSerGly 

ACTGCAAGCGCTGCAATAGAGGCTGCAACCOUVGAGATCCC^ 
101 ThrAlaSerAlaAlalleGluAlaAlaThrGlnGluIleProSerlleProIleSerLeu 

GAAGTTAATAGAGAAAAACACAAATTTGGTCAGGGCGAAGAGATTCAC^ 
121 GluValAsnArgGluLysHisLysPheGlyGluGlyGluGluIleAspPheSerAlaAla 

AAGTATTTCCTAAGAAAAATCGCAACGGCGGTTTTAAAGAGAGGCCrcCC^ 
141- LysTyrPheLeuArgLysIleAlaThrAlaValLeuLysArgGlyLeuProLysGlyVal 

GATATGCTGAACGTCAACGTCCCTTATCATGCAAATGAAAGGACAGAGATAGCTTTT 
1 6 1 AspMetLeuAsnValAsnValProTyrAspAlaAsnGluArgThrGluIleAlaPheThr 

CGCCTGGCAAGAAGGATGTATAGGCCTTCTATTGAAGAGCGCATAGACCCAAAGGGGAAT 
181 ArgLeuAlaArgArgMetTyrArgProSerlleGluGluArgllcAspProLysGlyAsn 

CCCTACTACTGGATAGTTGGAACTCAGTGCCCTAAGGAGGCATTAGAGCCGGGAACGGAT 
201 ProTyrTyrTrpIleValGlyThrGlnCysProLysGluAlaLeuGluProGlyThrAsp 

ATGTATGTAGTTAAAGTTGAGAGAAAAGTTAGCGTGACTCCAATAAACATTGATATGACA 
221 MetTyrValValLysValGluArgLysValSerValThrProIleAsnlleAspMetThr 

GCAAGAGTGAATTTAGACGAGATTAAAAGACTTTTAGAACTGTAG 
241 AlaArgValAsnLeuAspGluIleLysArgLeuLeuGluLeuEnd 255 



V 
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FIGURE 4 



Thermococcus celer Phosphatase (25A1A) 
Complete Gene Sequence 



ATGAGAACCCTGACAATAAACACTGACGCGGAGGGGTTCGTTTTGAGGATTCTCCTGACG 
1 MetArgThr LeuThrl leAsnThrAspAlaGluGlyPheValLeuArglleLeuLeuThr 2 0 

AACGACGATGGAAl^ACTCCAACGGACTGCGCGCCGCTGTGAAAGCCCTGAGTGAGC^ 
21 AsnAspAspGlylleTyrSerAsnGlyLeuArgAlaAlaValLysAlaLeuSerGluLeu 40 

GGCGAAGTTTACGTCGTTGCCCCCCTCTTCCAGAGGAGCGCGAGCGGCAGGGCCATGACG 
4 1 GlyGXuValTyrValValAlaProLeuPheGlnArgSerAlaSerGlyArgAlaMetThr 6 0 

CTCCACAGGCCGATAAGGGCCAAGCGCGTTGACGTTCCCGGCGCAAAGATAGCCTACGGA 
61 LeuHisArgProIleArgAlaLysArgValAspValProGlyAlaLysIIeAlaTyrGly 80 

ATAGATGGAACTCCTACTGACTGCGTGATTTTCGCCATAGCCCGCTTCGGGAGCT^^ 
Bl ileAspGlyThrProThrAspCysValllePheAlalleAlaArgPheGlySerPheGly 100 

TTAGCCGTGAGCGGGATTAACCTCGGCGAGAACCTGAGCACCGAGATAACAGTCTCAGGG 
101 LeuAloValSerGlylleAsnLeuGlyGluAsnLeuSerThrGluIleThrValSerGly 120 

ACGGCCTCCGCTGCCATAGAGGCCTCAACTCATGGAATTCCGAGCATAGCGATTA^ 
121 ThrAlaSerAlaAlalleGluAlaSerThrHisGlylleProSerlleAlallGSerLeu 140 

GAGGTGGAGTGGAAGAAGACCCTCGGCGAGGGTGAGGGGGTTGACTTCTCGGTCTC 
141 GluValGluTrpLysLysThrLeuGlyGluGlyGluGlyValAspPheSerValSerThr 160 

CACTTCCTCAAGAGAATCGCGGGAGCCCTCTTGGAGAGAGGTCTTCCTGAGGGCGTTC 
161 HisPheLeuLysArglleAlaGlyAlaLeuLeuGluArgGlyLeuProGluGlyValAsp 180 

ATGCTCAACGTCAACGTTCCGAGCGACGCGACGGAGGAAACGGAGATAGCAATCACCCGC 
181 MecLeuAsnValAsnValProSerAspAlaThrGluGluThrGluIleAlalleThrArg 200 

TTAGCCCGGAAGCGCTACTCCCCAACGGTCGAGGAGAGGATTGACCCCAAGGGCAACCCC 
201 LeuAlaArgLysArgTyrSerPiroThrValGluGluArglleAspProLysGlyAsnPro 220 

TACTACTGGATTGTCGGCAAACTTGTCCAAGACTTCGAGCCAGGGACAGATGCCTACGCC 
221 TyrTyrTrpIleValGlyLysLeuValGlnAspPheGluProGlyThrAspAlaTyrAla 240 

CTGAAGGTCGAGAGGAAGGTCAGCGTCACGCCGATAAACATAGATATGACTGCGAGGGTG 
241 LeuLysValGluArgLysValSerValThrProIleAsnlleAspMetThrAlaArgVal 260 

GACTTTGAGGAGCTTGTAAGGCTTCTGTGGGTGTAA 
261 AspPhi^GluCluLeuValArgVdibeuTrpValEnd 272 
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FIGURE 5A 



Thermococcus GU5L5 Phosphatase (26A1A) 
Compiete Gene Sequence (Pan 1 of 2) 

ATGAAAGGAAAGTCTCTTGTTAGCGGTCTCriXrrTGGGTC^^ 
1 MetLysGlyLysSerLeuValSerGlyLeuLeut-euGlyLeuLeulleLeuSerLeuIle 20 

TCATTCCAGCCAAGCTTTGCATACTCCCCACACGGCGGTGTCAAAAACATCATAATCCTG 
21 SerPheGlnProSerPheAlaTyrSerProHi6GlyGlyVolLysAsnIleIleIleL«u 40 

GTTGGAGACGGCATCXXriXTTGGGCATGTAGAAATTACAAAGCTCGTTTATG^ 
41 ValGlyAspGlyMecGlyLeuGlyHisValGluIleThrLysLeuValTyrGlyHiaLeu 60 

AACATGGAAAACTTTCCAGTTACTGGATTTGAGCTTACTGATTCCCT 
61 AsnMetCluAsnPheProValThrGlyPheGluLeuThrAspSerLeuSerGlyGluVal 80 

ACACSATTCTGCTGCGGCAGGAACTGCAATATCCACraiAGCTAAAAC^ 
Bl ThrAspSerAlaAlaAlaGlyThrAlalleStrThrGlyAlaLysThrTyrAsnGlyMet 100 

ATTTCAGTAACCyLftCATAACCGGAAAGATAGTTAACraAACAACCCTACTTGAAff 
101 IleSerValThrAsnlleThrGlyLysIleValAsnLeuThtThrLeuLcuGluValAla 120 

CAAmwxrrrOGCSAAGTCAACAGGGCTGGTCACCACAACAAGGATT^ 
121 GlnGluLeuGlyLysSerThrGlyLauValThrThrThrArglleThrHisAlaThrPro 140 

GCAGTTTTTGCGTCCCATGTCCCAGATAGGGATATOGAGGGCSGAGATACCXy^^ 
141 AlaValPhaAlaSerHisValProAspArgAspMetGluGlyGluIleProLysGlnLeu 160 

ATAATGCACAAAGTTAACGTCTTGTTGGGTGGTGGAAGGGAGAAATTCGATGAGAAAAAT 
161 IleMetHisLysValAsnValLeuLeuGlyGlyGlyArgGlxiLysPheAspGluLysAsn 180 

TTGGAGCTCX;CCAAAAACCAGGGATACAAAGTAGTTTTCAaiAAGGAAGAGCT^^ 
ISl LeuGIuLeuAlaLysLysGlnGlyTyrLysValValPheThrLysGluGluLeuGluLys 200 

GTTGAAGGAGATTATGTCCrAGCACTCTTTCCAGAAACTCACATCCCrrAC^ 
201 ValGluGlyAspTyrValt^euGlyLeuPheAlaGluSerHisIleProTyrValLeuAsp 220 

AGAAAACCCGATGATGrrGGACrrrrAGAAATGGCCAAAAAGGCAATTTCAATACTCGAG 
221 ArgLysProAspAspValGlyLeuLeuGluMetAlaLysLysAlalleSerlleLeuGlu 240 

AAGAACCCGAGCGGATTCTTTCTCATGCTTGAGGGCGGAAGGATTGACCAT^ 
241 LysAsnProSerGlyPhePheLeuMetValGluGlyGlyArglleAspHisAlaAlaHis 260 

GGAAACGATCTCGCATCGCrrCTTGCAGAAACTAAGGAGTTTGACCATO ^M^ TCAGATAC 
261 GlyAsnAspValAlaSerValValAlaGluThrLysGluPheAspAspValValArgTyr 260 

GTGCTGGAATATCCCAACAACACCCGAGATACCTTGGTAATAGTCCTTGCCGATCACGAA 
281 ValLeuGluTyrProLysLysArgClyAspThrLeuVallleValLeuAlaAspHisGlu 300 

ACTGGAGCTCTTCCAATAGCTCTAACCTATGGAAATGCAATCCATGAACATCCCATAAGA 
JOl ThrGlyClyLeuAla T IfHI] yLeuThrTyrGlyAsnAlal leAspGluAspAla I leArg 320 



AAAATAAAAGCAACCACtrn'C'.AfXIATCCCCAAAGAOGTTAAGCCACCGACTACTGTAAAA 
VI I Lysl leLysAlaSnrThi LouAi yMot ProLyaCluVa I Lys Al aCJ ySerSerVal Lys 340 
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FIGURE 5B 



Thermococcus GU5L5 Phosphatase (26A1A) 
Complete Gene Sequence (Part 2 of 2) 



GACTCCTCAAAGGTATGCCGGATTTGTCCCAACAGACZyU^GAAGTCAGTATATTGAGAAT 
341 GluSerSerLysValCysArglleCysProAsnArgGXyArgSerGlnTyrlleGluAsn 360 

GCGCTGCACTCGACAAACAAGTATGCCCTCTCAAATGCAGTAGCCGATGTTATAAACAGG 
361 AlaLeuHisSerThrAsnLysTyrAlaLeuSerAsnAlaValAlaAspVallleAsnArg 380 

CGTATTGCTGTTGGArrCACCTCCTATGAGCATACAGGAGTTCCA<r^ 
381 ArglleGlyValGlyPheThrSerTyrGluHisThrGlyValProValProLeuLeuAla 400 

TACGGTCCCGGGGCAGAGAACTTCAGAOGmCTTACACCATGTGGAT^ 
401 TyrGlyProGlyAlaGluAsnPheAxgGlyPheLeuKisHisValAspThrAlaArgLeu 420 

GTTGCAAAGTrAATOncrrTGGAAGGAGGAATATTCC^ 
421 valAlaLysLeuMetLeuPheGlyArgArgAsnlleProValThrlleSerSerValSer 44 0 

AGTOTAAGGGAGACATAACCGGTGATTACAGGGTTGATGAGAAGGATGOT 
441 serValLysGlyAspIleThrGlyAspTyrArgValAspGluLysAspAlaTyrValThr 

CTCATCATXnriCTCGGAGAAAAAGTGGATAATGAAATIX;^^ 
461 LeuMetMetPhcLexjGlyGluLysValAspAsnGluIieGluLysArgValAspIleAsp 

AACAACOGCATGGrrcACTTAAATGACGTCATGTTGATTC^ 
481 AsnAsnGlyMetValAspLeuAsnAspValMetLeuIleLeuGlnGluAlaEnd 4 98 



460 



4 80 
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FIGURE 6A 



OCSa Phosphatase (27A3A} 
Complete Gene Sequence (Part 1 of 2) 

ATGCCAAGAAATATCCCCGCrcTATGCGCCCTGGCCCCTTTGTTACKXnxrGGCCr^^ 
1 MecProArgAsnlleAiaAlaVa ICysAlaLeuAlaAlaLeuLeuGlySerAlaTrpAla 20 

gccaaacttgccgtctacccctacgacggagccgctttcctcgccxxx;cack:gcttcgat 

21 AlaLysValAlaValTyrProTyrAspGlyAlaAlaLeuLeuAlaGlyGlnArgPheAsp 40 

ttgcgcatagaagcctccgagctgaaaggcaatttaaaggcttaccgcatcaccctggac 

41 LeuArgIleGluAIaSerGluLeuLysGiyAsnL«uLysAlaTyrArgIleThrl*euAsp 60 

GGCXAGCCrKnXXX:GGGCXTayVGCAAACCGCCX:AGGGG^ 
61 GlyGlnProLeuAlaGlyteuGluGlnThrAlaGlnGlyAlaGlyGlnAlaGluTrpThr 80 



81 LeuArgGlyAlaPheLeuArgProGlySerHisThrLeuGluValSerLeuThrAspAsp 100 

GCTGGGGAGAGCAGGAAGAGCGTACGrrGGGAGGCTCGGCAGAACCTTCGCTTC^ 
101 AlaGlyGluSerArgLysSerValArgTrpGluAlaArgGlnAsnLeuArgLeuProArg 120 

GCCX5CaU^GAATC^XyVTTCTCTTCATTGGCGA^ 
121 AlaAlaLysAsnVallleLeuPhQileClyAspGlyMetGlyTrpAsnThrLeiiAsnAla 140 

GCCCGCATCATCGCCAAAGGCTTTAACCCCGAAAACOGTATGCCCAACGGAAACXr^^ 
141 AlaArgllelleAlaLysGlyPheAsnProGluAsnGlyMetProAanGlyAsnLeuGlu 160 

ATCGAGAGTGGTTACGGTGGGATGGCTACCGTCACTACCGOCAGCTTTGATAGCTTCATC 
161 IleGluSerGlyTyrGlyClyMetAlaThrValThrThrGlySerPheAspSerPhelle 180 

GCCGACTCAGCTAACTCGGCTTCTTCCATCATGACCGGGCAGAAGGTGCAGGTGAATGCC 
IBl AlaAspSerAlaAsnSerAlaSerSerlleMetThrGlyGlnLysValGlnValAsriAla 200 

CTCAACGTrrACCCATCAAACCTCAAAGATACCCTGGCCTACCCCCGGATCGAAACCCTA 
201 LeuAsnValTyrProSerAsnLeuLysAspThrLeuAlaTyrProArglleGluThrLeu 220 

GCGGAGATGCTCAAGCGGGTACGCGGGGCCAGCATTCGGGTAGTGACCACCACCTTCGGC 
221 AlaCluMecLeuLysArgValArgGlyAlaSerlleGlyValValThrThrThrPheCly 240 

ACCGACGCTACCCCGGCTTCACTCAACGCCCATACCCGCCGCCGCGCTGATTACCAGGCT 
241 ThrAspAlaThrProAlaSerLeuAsnAlaHisThrArgArgArgGlyAspTyrGlnAla 260 

ATCGCCGACATGTACTTTGGTAGAGGCGGCTTCCGTGTTCCCTTGGATGTGATGC^^ 
261 IleAlaAspMetTyrPheGlyArgGlyGlyPheGlyValProLeuAspValMetUeuPhe 280 

GCTCCTTCACGCGACTTCATCCCCCAGAGCACCCCTGCCTCGCGCCCCAACGATAGCACG 
281 GlyClySerArgAspPhelleProGlnSerThrProGiySerArgArgLysAspSerThr 300 

GACTGGATTGCCCAATCCCACAAGCTCCCCTACACCTTTCTCAGCACCCGCACCCAGCTC 
301 AspTrpl leAlaCluSerGlnLvsLeuClyTyrThrPlw^ValSorThrArgSerCluLeu 320 



CTC;f;CGCCrAAACX'CACCC;A1AAOCTtnTTCC:CCTCTTCAACATTCACAAC*T'rCtXXAG^ 
321 LeuAloAlaLysProThrAr.pl.ysl.ouPlieGlylrf^ul'ImA.'jnl 1 eA'rpAsn PheProSer 340 
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FIGURE 6B 



0C9a Phosphacase (27A3A) 
Complete Gene Sequence (Part 2 of 2) 



... ss!S=sts:ss=:iss5:ss^s ... 

„. s=Sis==ss=: ... 

... ^:^":s:;ssssx:^=ss^. ... 

... ^ss^^^xssss^^^^s^ ... 



CTCGAGAAGCCTTXA 
Sei LeuGluLysProEnd 595 
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FIGURE 7 



Mil TL Phosphatase (29A1A=29A2A) 
Complete Gene Sequence 

ATGTATAAATGGATTATTGAGGGTAAGCTTGCCCAAGCACCTTTTCCAAGCCTA<Knx;^ 
1 MetTyrLysTrpIIelleGluGlyLysLeuAlaClnAlaProPheProSerLeuGlyGlu 20 

CTAGCCGATCTCAAAAGACTTTTCGACGCCATTATTGTTCTTACAATGCCGCATGAACAA 
21 LeuAlaAspLeuLysArgLeuPheAspAlallelleVal LeuThrMetProHisGluGln 4 0 

CCGCTTAATGAGAAATATATCGAGATATTAGAGAGCCATGGATTCCAAGTCCTCCATGTC 
41 ProLeuAsnGluLysTyrlleGluIleLeuGluSerHisGlyPheGInValLpGuHisVal 60 

CCCACGCTCGACTTTCATCCTTTAGAACTCTTCGACCTTTTGAAAACAAGCAT^ 
61 ProThrLe\iAspPheHisProLeuGluLeuPheAspLeuL,euLysThrSerIlePheIle 80 

GATGAAAACCTGGAGAGATCCCACAGAGTGCTTGTCCACTGCATGGGAGGCATAGGCCGG 
81 AspGluAsnLeuGluArgSerHisArgValLeuValHisCysMetGlyGlyXleGlyArg 100 

AGCGGGCTTGTAACTGCTGCGTACTT'AATATTCAAAOGTTATGATATTTACGACGCGGTA 
101 SerGlyLeuValThrAlaAlaTyrLeuIlePheLysGlyTyrAspIleTyrAspAIaVal 120 

AAGOVTGrcAGAACGGTAGTGCCTGGTGCTATTGAAAACAGAGGGCAAGCGTTAATGm 
121 LysHisValArgThrValValProGlyAlalleGluAsnArgGlyGlnAlaLeuMetLeu 140 

GAGAACTACTATACCCTGCTCAAAAGTTTCAACAGAGAGTTGCTGAGAGACTACGGGAAG 
141 GluAsnTyrTyrThrLeuValLysSerPheAsnArgGluLeuLeuArgAspTyxGlyLys 160 

AAAATTTTCACGCTCGGTGACCCGAAGGCGGTTCTCCACGCrKrTAAGACGACTCAGT^ 
161 LysIlePheThrLeuGlyAspProLysAlaValLeuHisAlaSerLysThrThrGlnPhe 180 

ACGATTGAACTCTTAAGCAACTTACACGTCAACGAGGCGTTTTCAATCAGTGCGAT^ 
181 ThrZleGluLeuLeuSerAsnLeuHisValAsnGluAlaPheSerlleSerAlaHetAla 200 

CAATCACTGCTCCACTTTCACGACGTAAAAGTCCGCTCTAAACTGAAAGAAGTATTCG^ 
201 GlnSerLeuLeuHisPheHisAspValLysValArgSerLysLeuLysGluValPheGlu 220 

AACATGGAATTCTCATCCGCCTCAGAGGAGGTTCTGTCATTTATTCACCTACTCGATT^ 
221 AsnMetGluPheSerSerAlaSerGluGluValLeuSerPheZleHisLeuLeuAspPhe 240 

TATCAGGATGGCAGGGTTCTTrrAACCATTTACCATTATCTCCCCGATAGGGTGGATT^ 
241 TyrGlnAspGlyArgValValLeuThrlleTyrAspTyrLeuProAspArgValAspLeu 260 

ATTTTATTGTGTAACTGGGGTTGTGATAAAATACTTCAACTCTCCTCTTCAGCGAAGAAA 
261 lie Leu LeuCys LysTr pCl yCysAspLys I leVa IG I oVa 1 Se r SerSer Al a Lys Lys 200 

ACCGTTGAGAACt'TTGTACCAAGAAAGGrrTCCCTATtCTGGGCTAATTACTTAGACTAT 
2R1 ThrVa ICJ ul.yEl.PviVa IfU vArgl.ysValSerl.iMj.Sut TrpA laAsnTyrLeuAspTyi iOt) 



CTPIAC; 
■J 01 ValKnd 
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FIGURE 8 



Thermococcus CL-2 Phosphatase (30A1A) 
Complete Gene Sequence 



40 



60 



80 



ATGAGAATCCTCCTCACCAACGACGACGGCATCTATTCCAACGGTCTGCGCGCGG^ 
1 MaLArglleLeuLeuThrAsnAspAspGlylleTyrSerAsnGlyLeuArgAlaAlaVal 20 

AAGGGCCTGAGCGAGCTCGGCGAGGTCTACGTCGTCGCCCCGCTCTTCCAGAGGAGCGrc 
21 LysGlyLeuSerGluLeuGlyGluValTyrValValAlaProL^uPheGlnArgSerAla 

AGCGGTCGGGCGATGACCCTACACAGGCCGATAAGGGCAAAGAGGGTTCAC^^^ 
41 SerGlyArgAlaMetThrLeuHisArgProIleArgAlaLysAxgValAspValProGly 
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FIGURE 9 



Aquifex VF-5 Phosphatase {34A1A) 
Complete Gene Sequence 

ATGGAAAACTTAAAAAAGTACCTAGAAGTTGCAAAAATAGCCGCGCTCGCGGGTGGGCAG 
i MetGluAsnLeuLysLysTyrLeuGluValAlaLysIleAlaAlaLeuAlaGlyGlyGln 2 0 

GTTCTGAAAGAAAACTTCGGAAAGGTAAAAAAGGAAAACATAGAGGAAAAAGGGGAAAAG 
21 ValLeuLysGluAsnPheGlyLysValLysLysGluAsnlleGluGluLysGlyGluLys 40 

GACTTTGTAAGTTACGTGGATAAAACTTC AGAGGAAAGGATAAAGGAGGTGA TACTCAAG 
41 AspPheValSerTyryalAspLysThrSerGluGluArglleLysGluVallleLeuLys 60 

TTCTTTCCCGATCACGAGGTCGTAGGGGAAGAGATGGGTGCGGAGGGAAGCGGAAGCGAA 
61 PhePheProAspHisGluValValGlyGluGluMetGlyAlaGluGlySerGlySerGlu 80 

TACAGGTGGTTCATAGACCCCCrTGACGCCACAAAGAACTACATAAACGGTTTTCCCATC 
81 TyxArgTrpPhelleAspPrpLeuAspGlyThrLysAsnTyrlleAsnGlyPheProIle 100 

TTTGCCGTATCAGTGGGACTTGTTAAGGGAGAAGAGCCAATTGTGGGTGCGGTTTACCTT 
101 PheAlaValSerValGlyLeuValLysGlyGluGluProIleValGlyAlaValTyTl*eu 120 

C(nTACTTTGACAAGCTTTACTGGGGTGCTAAAGGTCTCGGGGCTTACGTAAACGGAAA^ 
121 ProTyrPheAspLysLeuTyrTrpGlyAlaLysGlyLeuGlyAlaTyrValAsnGlyLys 140 

AGGATAAAGGTAAAGGACAATGAGAGrrTAAAGCACGCCGGAGTGGTTTACGGATTTCCC 
141 ArglleLysValLysAspAsnGluSerLeuLysHisAlaGlyValValTyrGlyPhePro 160 

TCTAGGAGCAGGAGGGACATATCTATCTACTTGAACATATTCAAGGATGTCTTTTACGAA 
161 SerArgSerArgArgAspIleSerlleTyrLeuAsnllePheLysAspValPheTyrGlu 180 

GTTGGCTCTATGAGGAGACCCGGGGCTGCTGCGGTTGACCTCTGCATCGTGGCGGAAGGG 
181 ValGlySerMetArgArgProGlyAlaAlaAlaValAspLeuCysMetValAlaGluGly 2 00 

ATATTTGACGGGATGATGGAGTTTGAAATGAAGCCGTGGGACATAACCGCAGGGCTTGTA 
201 IlePheAspGlyMetMetGluPheGluMetLysProTrpAspIleThrAlaGlyLeuVal 220 

ATACTGAAGGAAGCCGGGGGCGTTTACACACrrGTGGGAGAACCCTTCGGAGTTTCGGAC 
221 IleLeuLysGluAlaGlyGlyValTyrThrLeuValGlyGluProPheGlyValSerAsp 240 

ATAATTGCGGGCAACAAAGCCCTCCACGACTTTATACTTCAGGTAGCCAAAAAGTATATG 
24 1 IlelleAlaGlyAsnLysAlaLeuHisAspPhelleLeuGlnValAlaLysLysTyrMet 260 

GAAGTGGCGGTGTGA 
261 GluValAlaValEnd 26b 
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